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The  purpose  of  this  research  is  threefold.  The  first  goal  is  to 
model  the  Electroglottographic  (EGG)  signal,  the  second  is  to 
investigate  its  potential  in  measuring  parameters  of  the  vibrations  of 
the  vocal  folds;  and  the  last  is  to  explore  its  use  in  laryngeal 
pathology  detection. 

The  EGG  signal  is  believed  to  be  closely  related  to  the  lateral 
area  of  contact  between  the  vocal  folds.  Observations  of  ultra-high- 
speed films  of  the  vibrations  of  the  vocal  folds  presented  a  plausible 
case  for  such  a  conjecture  and  resulted  in  a  descriptive  model  of  this 
signal.  In  this  work  we  carried  this  idea  further  and  constructed  a 
computer  or  mathematical  model  that  simulates  this  signal. 

The  resulting  model  is  very  flexible  and  permits  simulation  of  EGG 
signals  for  vibrations  of  the  vocal  folds  under  various  conditions.  In 
Chapter  3  we  provide  examples  of  the  model  output  for  many  of  these 
conditions,  such  as  varying  the  phase  between  the  lower  and  upper  edges 


VI 


of  the  folds  as  well  as  the  opening  and  closing  angles.  It  is  shown 
that  at  a  certain  phase  lag,  the  simulated  EGG  signal,  is  remarkably 
similar  to  the  EGG  of  a  vocal  fry.  Another  important  aspect  of  this 
model  is  its  ability  to  simulate  the  effects  of  a  vocal  fold  nodule  on 
the  EGG  signal.  Other  phenomena,  such  as  mucus,  can  also  be  simulated 
using  this  model . 

Our  second  goal  for  using  the  EGG  signal  to  measure  different 
vibration  parameters  was  also  accomplished.  In  Chapter  5  and  the 
Appendix  we  discuss  different  algorithms  that  were  developed  to 
accomplish  these  measurements.  A  new  parameter  called  EGG  Closing  Time 
or  "ECT"  for  short  was  also  proposed.  The  results  are  discussed  and 
organized  in  tables  and  presented  in  Chapter  5. 

Finally,  we  proposed  and  developed  a  new  method  for  feature 
extraction  for  laryngeal  pathology  detection  based  on  the  probability 
mass  function  (PMF)  measured  from  the  EGG  signal.  The  results  show  that 
the  PMFs  for  subjects  with  a  normal  larynx  are  different  from  those  with 
a  laryngeal  pathology. 
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CHAPTER  1 
INTRODUCTION 


1.1  Speech  Production  and  Perception 

Many  members  of  the  animal  kingdom  produce  voice,  and  some  of  them, 
such  as  whales  and  porpoises,  are  thought  to  communicate  using  clicks 
and  pop  sounds.  However,  the  ability  to  code  and  transmit  information 
vocally  seems  to  be  unique  to  the  human  species.  Some  went  further  to 
suggest  that  humans  should  be  labeled  "homoloquens"  [1],  in  recognition 
of  this  remarkable  achievement. 

Humans  use  voice  not  only  for  communication  through  speech.  They 
have  also  used  it  in  artistic  activities  including  singing  and 
theatrical  performances.  However,  the  basic  function  of  the  voice 
production  system  in  humans  is  communication  via  speech.  This  is 
readily  obvious  when  one  realizes  that  without  the  ability  to  speak,  one 
cannot  engage  in  many  endeavors  and  is  excluded  from  many  professions 
such  as  acting,  teaching,  singing,  lecturing,  preaching,  etc.  Actually, 
one  is  hard  pressed  to  find  any  human  activity  that  doesn't  involve,  to 
some  extent,  the  act  of  speaking  or  communication  via  speech. 

Speech  is  the  most  efficient  method  of  communication  between 
humans.  This  fact  revolutionized  the  field  of  electronic  communications 
and  was  the  driving  force  behind  the  invention  of  the  telephone 
system.  Now,  terrestrial  and  satellite  communication  systems  are 
commonplace  and  serve  to  connect  people  throughout  the  globe  and  people 
traveling  in  outer  space. 
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One  aspect  of  this  dissertation  is  to  add  to  our  understanding  of 
the  process  of  human  speech.  Figure  1.1  shows  the  human  vocal  system. 
The  lungs  act  as  the  power  source  that  drives  the  entire  system.  The 
lungs  are  used  primarily  to  extract  oxygen  from  air,  but  also  as  a 
reservoir  of  air  that  drives  the  vocal  system.  The  increase  in  lung 
pressure  below  the  vocal  folds  forces  them  to  open,  and  air  rushes 
through  the  vocal  cavities.  The  flow  of  air  through  the  opening  between 
the  vocal  folds  (glottis)  is  different  for  different  sounds.  For  voiced 
sounds,  the  air  flow  is  periodically  interrupted  by  the  vibration  of  the 
vocal  folds.  According  to  the  aerodynamic-myoelastic  theory  of 
phonation  [2],  the  vibration  of  the  folds  is  directly  connected  to  the 
air  flow  through  the  glottis  and  is  the  steady-state  result  of  the 
interplay  of  two  sources.  On  one  hand,  the  pressure  built  on  the 
subglottal  space  causes  the  air  to  push  the  folds  apart  and  flow  through 
the  glottis.  On  the  other  hand,  the  velocity  of  air  passing  through  the 
glottis  results  in  a  drop  in  pressure  across  the  fold  opening  producing 
a  suction  effect  that  pulls  the  folds  back  together  and  closes  the 
glottis.  This  phenomenon  is  referred  to  as  the  Bernoulli  effect. 
Tension  and  stiffness  in  the  folds  is,  perhaps,  an  even  more  important 
factor  in  controlling  vocal  fold  vibration.  Other  theories  [3]  have 
been  postulated  to  explain  the  fold  vibration;  however,  these  theories 
have  not  been  fully  accepted.  The  vibration  of  the  folds  generates  an 
acoustic  wave  that  travels  through  the  vocal  and  nasal  tract  cavities. 
The  shape  of  the  vocal  and  nasal  tract  cavities  filters  the  acoustic 
pressure  wave  that  is  radiated  finally  as  a  pressure  waveform  at  the 
lips  and  nostrils.  For  unvoiced  sounds  a  constriction  is  formed  either 
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Figure  1.1  Schematic  diagram  of  the  human 
vocal  mechanism.  Adapted  from  [4]. 


at  the  glottal  level  or  at  some  location  in  the  supraglottal  level.  The 
air  flow  through  this  constriction  produces  turbulance  in  the  air 
flow.  This  again  is  filtered  by  the  vocal  and  nasal  tracts  as  described 
earlier.  Fant  [5,6]  provided  an  excellent  mathematical  treatment  of  the 
entire  process. 

The  system  described  above  is  controlled  by  higher  order  processes 
in  the  human  brain  [7].  Abstract  thought  is  converted  into  language  by 
the  linguistic  or  artistic  centers  in  the  cerebral  cortex.  The  commands 
from  these  centers  are  transmitted  to  the  motor  cortex.  The  motor 
cortex  gives  a  series  of  commands  to  the  respiratory,  laryngeal,  and 
articulatory  muscles.  The  extra  pyramidal  system,  which  includes  some 
parts  of  the  cerebral  cortex,  the  cerebellum,  and  the  basal  ganglia, 
provides  additional  regulation  of  the  activity  of  the  respiratory, 
laryngeal,  and  articulatory  musculature.  The  activity  of  these  muscles 
results  in  movements  of  the  phonatory  organs  which  produce  a  series  of 
sounds  known  as  voice.  This  voice  is  a  pressure  wave  that  activates  the 
hearing  mechanism  of  the  listener.  The  hearing  mechanism  converts  this 
physical  signal  into  neural  messages  that  are  translated  into  a  language 
form  and  then  an  abstract  thought  in  the  listener's  brain. 

The  voice  is  also  transmitted  to  the  speaker's  brain  where  it  is 
checked  against  the  preplanned  sound  by  the  feedback  mechanism  of  the 
speaker.  The  feedback  mechanism  operates  via  the  deep  and  superficial 
sensory  receptors  which  provide  information  about  the  muscular 
contraction  and  movements  of  the  phonatory  organs.  Figure  1.2 
illustrates  this  process. 
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Figure  1.2  Schematic  representation  of  voice  production  and  its 
control  system.  Adapted  from  [7], 


A  voice  disorder  may  result  if  any  part  of  this  system  is 
damaged.  However,  in  this  research  we  are  most  interested  in  the 
laryngeal  structure  section  of  the  system  and  more  precisely  the  vocal 
folds  mechanism.  Although  this  part  of  the  system  seems  to  be  the 
easiest  to  study,  there  is  little  understanding  of  the  operation  of  the 
vocal  folds.  One  reason  can  be  attributed  to  the  relative 
inaccessability  of  the  larynx  that  holds  the  vocal  folds.  According  to 
Ludlow  [8],  "Identifying  research  needs  for  the  assessment  of  phonatory 
functioning  and  vocal  pathologies  is  not  difficult,  the  gaps  in 
knowledge  are  awesome"  (p.  3). 

1.2  Research  Issues 


The  inaccessability  of  the  laryngeal  structure  has  prevented  us 
from  fully  understanding  or  observing  the  vibrations  of  the  folds.  The 
electroglottographic  (EGG)  signal  represents  a  potential  electrical 
probe  that  can  be  used  to  study  events  inside  the  larynx. 

The  EGG  signal  is  believed  to  be  directly  related  to  the  folds' 
lateral  area  of  contact,  and  a  descriptive  model  has  been  proposed  based 
on  this  conjecture.  However,  no  adequate  mathematical  model  that  could 
be  implemented  on  a  computer  has  yet  been  developed  for  this  signal. 
This  dissertation  develops  such  a  model.  We  believe  that  such  a  model 
is  essential  for  understanding  many  aspects  of  the  folds'  vibration  as 
well  as  other  events  at  the  laryngeal  level.  We  also  believe  that  given 
the  EGG  signal,  this  model  can  be  further  developed  to  predict  the 
glottal  configuration.  This  result  can  be  very  significant  in 
implementing  a  successful  articulatory  based  vocoder. 


So  far,  measurements  of  the  vocal  folds'  vibration  parameters  are 
accomplished  through  measurements  made  from  high  speed  films  of  the 
folds'  vibration.  This  is  an  expensive,  laborious,  and  extremely 
complicated  process,  requiring  a  high  level  of  expertise.  Also,  the 
process  imposes  severe  restrictions  on  subject  selection  and  the 
possible  uttered  sounds  that  can  be  filmed.  On  the  other  hand,  the 
electroglottograph  is  inexpensive,  noninvasive,  easy  to  use,  and  does 
not  impose  any  restriction  on  the  uttered  sounds  or  the  subject.  Given 
all  these  desired  features,  is  it  possible  to  use  such  a  device,  and  in 
particular,  its  output  signal,  to  measure  the  vibration  parameters? 

Finally,  since  the  EGG  is  related  to  the  contact  area  between  the 
folds,  is  it  possible  to  use  the  EGG  to  detect  any  organic  changes  in 
the  folds'  tissue  caused  by  a  certain  pathology?  Also,  using  the  EGG, 
is  it  possible  to  detect  these  pathologies  in  their  early  development? 
Can  we  differentiate  between  different  pathologies?  In  this  research  we 
attempt  to  answer  some  of  these  questions. 

1.3  Description  of  Chapters 

Chapter  2  gives  a  brief  description  of  the  physiology  of  the  vocal 
folds.  It  also  discusses  some  of  the  methods  used  to  detect  vocal 
folds'  disorders. 

Chapter  3  presents  a  model  of  the  EGG  signal  based  on  vocal  folds' 
contact  area.  Simulation  results  of  the  EGG  signal  for  various 
conditions  and  phenomena  are  also  presented.  Chapter  4  details  the  data 
collection  and  measurement  system  we  used  to  collect  our  data  base  of 
synchronized  area,  speech,  and  EGG  signals.  The  methods  and  algorithms 
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developed  for  measuring  the  vibration  parameters,  along  with  the 
measurement  results,  are  presented  in  Chapter  5. 

Chapter  6  presents  our  measurement  results  for  parameters  believed 
to  indicate  the  presence  of  a  pathology.  We  also  present  our  proposed 
new  method  for  feature  extraction  for  laryngeal  pathology  detection. 
Finally,  Chapter  7  summarizes  our  results  and  proposes  further 
development  of  some  areas  of  this  research. 


CHAPTER  2 
THE  VOCAL  FOLDS 


The  purpose  of  this  chapter  is  to  provide  an  introduction  to  the 
vocal  folds'  structure  along  with  pertinent  studies  that  investigated 
different  pathologies  associated  with  this  structure.  Previous  methods 
that  were  utilized  to  detect  these  pathologies  are  also  presented. 

2.1  Vocal  Folds'  Structure 

As  mentioned  earlier,  the  vocal  folds  are  the  basic  vibrator  in  the 
human  speech  production  system.  They  reside  inside  the  larynx  at  the 
top  of  the  trachea,  where  they  also  protect  the  trachea  and  lungs  from 
the  intrusion  of  foreign  material. 

The  vibration  of  the  vocal  folds  is  essential  for  voiced  sound 
production  in  English  speech.  The  structure  of  the  folds  is  unique  and 
is  intricately  controlled  by  the  activity  of  the  laryngeal  muscles.  The 
pair  of  folds  is  capable  of  producing  a  great  variety  of  fundamental 
frequencies,  intensities,  and  tonal  qualities  compared  to  the  multiple 
strings  needed  in  many  musical  instruments. 

Hirano  studied  the  structure  of  the  folds  extensively  [7].  Figure 
2.1  shows  a  frontal  section  through  the  middle  of  the  membranous  portion 
of  the  human  vocal  fold.  From  a  histological  point  of  view  the  folds' 
structure  can  be  divided  into  five  different  layers: 
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Figure  2.1  A  frontal  section  of  a  human  vocal  fold 
through  the  middle  of  the  membranous  portion  [7]. 
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(1)  The  thin  and  stiff  capsule  that  maintains  the  shape  of  the 
folds,  called  the  epithelium.  This  layer  is  made  of  squamous 
type  cells. 

(2)  The  lamina  propria  superficial  layer.  This  layer  is  somewhat 
like  a  mass  of  soft  gelatin  consisting  of  loose  fiberous 
components,  and  occasionally  referred  to  as  Reinke's  space. 

(3)  The  lamina  propria  intermediate  layer.  This  layer  resembles 
a  bundle  of  soft  rubber  consisting  chiefly  of  elastic  fibers. 

(4)  The  deep  layer  of  the  lamina  propria.  This  layer  consists  of 
collagenous  fibers  resembling  a  bundle  of  cotton  thread. 

(5)  The  vocalis  muscle.  This  muscle  resembles  a  bundle  of  rubber 
bands  that  constitutes  the  main  body  of  the  vocal  fold. 

From  a  mechanical  point  of  view  the  vocal  folds  are  divided  into 
three  different  sections:  the  cover,  the  transition  layer,  and  the 
body,  which  is  the  vocalis  muscle.  Each  of  these  sections  differs  in 
mechanical  properties  from  the  other  sections.  The  control  of  these 
sections  is  also  different.  While  the  first  two  sections,  which  are 
commonly  referred  to  as  the  mucus  membrane,  are  controlled  passively, 
the  body  of  the  vocalis  muscle  is  controlled  both  actively  and 
passively. 

As  in  any  other  mechanical  system,  lubrication  of  the  folds  is 
essential  to  a  proper  and  sustained  functioning  [9].  The  ventricular 
glands  squirt  mucus  on  the  vibrating  folds.  Excessive  mucus  can 
sometimes  act  as  an  additional  layer  of  the  vocal  fold  [10]. 
Krishnamurthy  [11]  found  that  the  mucus  can  influence  the 
electroglottographic  (EGG)  signal  considerably,  providing  a  highly 
conductive  path  for  the  radio  frequency  signal  used  by  the  EGG  device. 
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According  to  Hirano  [12],  most  pathologies  originate  in  one  layer 
of  the  vocal  folds. 

2.2  Detection  Methods  for  Vocal  Fold  Disorders 

Voice  disorders  can  be  classified  into  two  general  categories, 
functional  and  organic.  Functional  disorders  are  associated  with 
incorrect  use  or  misuse  of  otherwise  healthy  and  nondefective  vocal 
organs.  The  incorrect  use  of  the  vocal  folds  is  sometimes  related  to 
psychological  problems.  However,  in  some  cases,  misuse  is  intentional, 
to  produce  a  unique  quality  as  in  the  case  of  actors  or  singers. 

Organic  disorders  result  from  organic  changes  to  the  structure  of 
the  folds.  It  is  interesting  to  note  that  functional  misuse  of  the 
voice  such  as  excessive  shouting  in  a  football  game  will  result  in  an 
organic  disorder  due  to  the  development  of  a  nodule. 

Many  methods  exist  for  detecting  or  evaluating  voice  disorders. 
However,  the  vast  majority  have  been  used  only  in  research.  They  are 
complicated  and  usually  are  not  readily  available  in  voice  clinics.  The 
need  exists  for  simple  acoustic  analysis  procedures  to  evaluate  or 
detect  different  pathologies. 

We  now  turn  our  attention  to  the  methods  used  to  either  detect  or 
evaluate  voice  disorders. 

2.2.1  Aerodynamic  Tests 

Four  aerodynamic  parameters  are  usually  utilized  in  this 
procedure:  subglottal  pressure,  supraglottal  pressure,  glottal 
impedance,  and  the  volume  velocity  of  the  air  flow  at  the  glottis. 
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The  subglottal  pressure  is  measured  by  locating  a  pressure 
transducer  below  the  vocal  folds.  As  expected,  this  is  a  difficult 
procedure  and  rarely  used  in  living  human  beings.  Five  methods  [7]  are 
used  for  locating  the  pressure  transducer  below  the  glottis.  A  tracheal 
puncture  using  a  spinal  needle  or  a  modified  version  is  inserted  into 
the  trachea  through  the  cervical  skin.  A  transglottal  catheter  can  also 
be  inserted  into  the  subglottal  space  through  the  mouth;  however,  this 
procedure  interferes  with  the  normal  vibration  of  the  folds. 
Measurement  of  subglottal  pressure  is  most  easily  implemented  for 
patients  with  tracheostomies,  through  the  opening  already  present,  but 
this  is  a  very  small  patient  population. 

Koike  and  Perkins  [13]  utilized  an  ultra-miniature  solid-state 
pressure  transducer  placed  directly  in  the  subglottal  space.  This 
procedure  does  not  interfere  with  the  folds'  vibration  and  the 
transducer  has  a  high  frequency  response,  but  the  frequency  response  of 
the  transducer  is  temperature  dependent. 

The  preceding  procedures  employ  direct  measurement  techniques  of 
the  subglottal  pressure.  The  esophageal  balloon  uses  an  indirect  method 
of  measuring  the  subglottal  pressure.  In  this  procedure  a  balloon 
connected  to  a  tube  is  inserted  into  the  esophagus  through  the  nose. 
The  intraesophageal  pressure  measured  is  related  to  the  intratracheal 
pressure,  which  is  approximately  equal  to  the  subglottal  pressure. 
However,  factors  other  than  the  intratracheal  pressure,  such  as  the 
expanding  and  contracting  lung  volume,  and  contraction  of  esophageal 
muscles  during  swallowing,  contribute  to  the  intraesophageal  pressure. 
This  technique,  therefore,  is  valid  only  in  some  limited  conditions. 
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Researchers  [14-16]  reported  an  increase  in  subglottal  pressure  for 
patients  with  carcinoma  and  recurrent  laryngeal  nerve  paralysis. 

The  subglottal  pressure  along  with  the  supraglottal  pressure  and 
the  glottal  resistance  can  be  used  to  calculate  the  mean  flow  rate.  It 
is  usually  measured  using  devices  such  as  the  spirometer, 
pneumotachograph,  and  the  hot-wire  anemometer. 

The  glottal  resistance  cannot  be  measured  directly,  but  it  is 
calculated  by  dividing  the  subglottal  pressure  by  the  mean  flow  rate. 

The  mean  flow  rate  is  generally  greater  for  pathological  subjects 
than  normals  [17-20]  and  in  general  it  can  be  used  to  monitor  progress 
of  treatment. 

The  instantaneous  flow  rate,  usually  referred  to  as  the  volume 
velocity  wave  (v-v),  can  also  be  measured.  Berouti  [21]  applied  linear 
prediction  inverse-filtering  techniques  to  obtain  the  volume  velocity 
waveform.  Rothenberg  [22]  obtained  the  volume  velocity  waveform  using  a 
pneumotachograph  mask  to  inverse  filter  the  oral  volume  velocity.  On 
the  other  hand,  Sondi  [23]  used  a  reflectionless  acoustical  tube  to 
inverse  filter  the  acoustic  wave  at  the  lips. 

Although  Berouti  reported  peculiar  v-v  wave  shapes  for  pathological 
cases,  the  v-v  waveform  has  not  been  used  extensively  for  pathology 
detection.  However,  the  residue  signal  derived  from  the  v-v  signal  has 
been  used  as  a  potential  source  of  information  regarding  disorders  of 
the  vocal  folds,  as  we  will  see  later  in  the  discussion. 

Using  the  aerodynamic  measurements  mentioned  above,  phonatory 
parameters  can  be  measured  to  detect  pathologies.  One  parameter  is  the 
maximum  phonation  time  (MPT),  where  the  subject  is  instructed  to  sustain 
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a  vowel  /a/  as  long  as  possible  following  a  deep  inspiration.  For 
nonsinger  patients,  the  sustained  phonation  is  made  at  a  comfortable 
fundamental  frequency  and  intensity.  For  singers,  the  two  parameters 
range  and  level  can  be  controlled.  Studies  on  pathological  subjects 
[18,24]  indicate  a  decrease  in  the  MPT  from  that  of  normal  subjects. 
This  test  can  also  be  used  to  monitor  progress  of  treatment. 

2.2.2  Examination  of  the  Vocal  Folds'  Vibration 

Normal  vibration  of  the  vocal  folds  are  a  prerequisite  for  a  normal 
sounding  voice.  Moore  [25]  and  Moore  et  al .  [26]  have  observed  that  it 
is  the  folds'  vibration  and  not  the  disease  itself  that  determines  the 
phonatory  quality  of  the  resulting  sound.  This  fact  suggests  a 
nonuniqueness  between  perceptual  acoustics  of  the  sound  and  the 
underlying  pathology,  if  different  pathologies  result  in  similar  folds' 
vibration. 

Many  methods  exist  for  observing  the  vibration  of  the  vocal 
folds.  These  techniques  include  stroboscopy,  ultra-high-speed 
photography,  photo-electric  glottography,  electroglottography,  and 
ultrasound  glottography. 

Stroboscopic  examination  of  the  vibrating  fold  allows  the  examiner 
to  freeze  the  image  of  the  folds  at  the  same  position  in  the  vibratory 
cycle  when  the  strobe  light  is  synchronized  with  the  folds'  vibratory 
cycle.  If  the  strobe's  flashes  are  emitted  at  frequencies  slightly  less 
than  the  folds'  vibrating  frequency,  a  slow  motion  effect  of  the 
vibrating  folds  is  produced.  This  technique  can  also  be  applied  using 
x-rays  instead  of  light  beams.   Still  photographs  can  also  be  produced 
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using  this  technique.  These  photographs  allow  evaluation  of  the  size, 
position,  shape,  orientation,  and  color  of  the  laryngeal  structure,  but 
they  do  not  show  fine  details  of  the  vibratory  cycle.  This  is  the  basic 
difference  between  stroboscopy  and  ultra  high  speed  photography.  This 
technique  can  be  and  is  implemented  in  clinical  settings. 

On  the  other  hand,  ultra  high  speed  photography  of  the  vibrating 
folds  [27,28]  at  speeds  of  up  to  5000  frames/sec  provides  a  detailed 
description  of  the  folds'  vibration.  The  film  produced  allows  a  study 
of  the  vibration  of  the  folds  during  a  glottal  cycle.  This  technique  is 
used  in  this  study  and  is  discussed  in  detail  in  Chapter  4.  Using  this 
method,  different  parameters  of  the  folds'  vibration  can  be  measured  as 
will  be  illustrated  later  in  this  section.  However,  this  technique  can 
only  be  implemented  in  research  laboratories  since  it  is  too  complex  and 
expensive  for  clinical  use. 

Another  technique  used  is  the  photo-electric  glottography  or 
photoglottography  [29,30],  This  method  utilizes  a  light  source  placed 
superior  to  the  glottis  and  a  photo-multiplier  placed  against  the  neck 
just  below  the  larynx.  The  intensity  of  the  light  detected  by  the 
photomultiplier  is  directly  related  to  the  area  of  the  glottis  between 
the  vocal  folds.  The  location  of  the  light  transmitter  and  receiver  can 
be  reversed.  This  technique  suffers  from  many  shortcomings,  such  as  the 
light  density  distribution  within  the  vocal  folds  may  not  be  constant, 
or  the  changing  cross-sectional  area  of  the  vocal  folds  in  an  interior 
posterior  plane  may  result  in  an  uneven  illumination  of  the  folds. 
Also,  slight  differences  in  the  locations  of  the  detector  causes 
different  waveforms.  However,  this  technique  has  the  advantage  of 
unrestricted  range  of  phonation. 
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Ultrasound  glottography  [31]  is  another  method  of  monitoring  vocal 
fold  vibration.  The  technique  is  based  on  the  reflection  of  the  sound 
wave  at  the  interface  between  two  media  with  different  specific  acoustic 
impedance.  This  is  a  new  technique  and  research  is  being  conducted  to 
assess  its  usefulness. 

On  the  other  hand,  electroglottography  [10,32-34]  is  based  on  the 
electrical  transmission  of  a  high  frequency  current  through  the  tissues 
at  the  glottal  level.  The  vibration  of  the  vocal  fold  constitute  a 
varying  impedance  path  that  modulates  a  high  frequency  rf  current  (3.5 
MHz)  transmitted  between  two  electrodes  placed  on  either  side  of  the 
thyroid  cartilage.  The  Mind-Machine  Interaction  Laboratory  has 
conducted  extensive  studies  [32-35]  using  this  device  in  synchrony  with 
high  speed  photography  and  the  speech  signal.  The  studies  indicate  the 
versatility  of  the  resulting  EGG  signal.  It  was  found  that  the  EGG 
signal  can  indicate  the  fundamental  frequency  of  the  folds'  vibration, 
the  closed  and  open  phase  regions,  the  closing  time  of  the  folds,  and 
the  duty  cycle,  along  with  other  interesting  features,  such  as  the  use 
of  the  EGG  signal  as  the  excitation  signal  for  the  LPC  synthesizer  [35]. 

Fourcin  [10]  used  the  EGG  signal  for  pathology  discrimination.  He 
studied  the  fundamental  frequency  distribution  from  normal  and  abnormal 
subjects.  His  studies  indicate  differences  in  the  frequency  range  and 
distribution  between  normals  and  subjects  with  voice  pathology.  Smith 
[36]  used  a  discriminant  analysis  procedure  on  the  distribution  of  the 
roots  of  the  autocorrelation  linear  prediction  analysis  of  the  EGG 
signal.  His  procedure  was  successful  74%  of  the  time  in  discriminating 
between  normals  and  pathological  subjects.   He  also  noted  some  time 


18 


domain  characteristics  of  the  EGG  signal  associated  with  the  presence  of 
pathology.  These  include  double  periodicity,  changes  in  the  rising 
slope  and  the  often  shorter  glottal  open  phase  for  patients  with 
pathology. 

2.2.2.1  Vocal  fold  vibration  parameters 

Using  the  techniques  discussed  earlier  many  vibration  related 
parameters  can  be  measured.  These  parameters  can  be  divided  as  follows 
[12]: 

Horizontal  excursion  of  the  edge  of  the  vocal  folds.  This 
parameter  is  hard  to  measure  since  the  edge  of  the  fold  is  not  a  fixed 
location  on  the  fold;  it  is  the  most  medially  located  part  of  the 
vibrating  fold.  In  chest  register,  a  phase  difference  exists  along  the 
length  and  depth  of  the  vocal  folds  during  their  vibration  [11],  and  it 
is  not  possible  to  define  the  horizontal  excursion.  However,  in 
falsetto  phonation,  when  the  folds  are  taut  and  the  entire  fold  moves 
medially  along  its  length,  horizontal  excursion  can  be  measured 
easily.  The  high  speed  filming  has  a  definite  advantage  when  measuring 
this  parameter  due  to  the  high  sampling  frequency  of  the  vibrating  folds 
and  the  two  dimensional  projection  present  on  the  face  of  the  film. 

Limited  excursion  of  one  of  the  folds  or  both  can  indicate  possible 
fold  paralysis. 

Glottal  width.  This  parameter  is  measured  routinely  in  high  speed 
filming.  Since  the  width  between  the  two  folds  is  not  uniform  along 
their  length,  it  is  measured  at  five  or  more  specific  points  along  the 
fold's  length.  However,  this  parameter  could  be  defined  to  indicate  the 
distance  between  the  middle  of  the  membranous  part  of  the  two  folds. 
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Glottal  area.  The  term  glottis  refers  to  the  area  between  the 
edges  of  the  two  folds.  The  area  function  characterize  the  time  history 
of  the  vibration  of  the  folds.  High  speed  films  can  give  an  absolute 
area  measurement  when  a  grid  in  the  same  focal  plane  as  the  folds  is 
superimposed  on  the  laryngeal  film.  This  procedure  is  described  fully 
in  Chapter  4.  Nonclosure  in  certain  types  of  phonation  indicates  the 
possible  presence  of  pathology. 

Fundamental  frequency  of  vibration.  The  fundamental  frequency  of 
vibration  is  the  inverse  of  the  period  of  one  glottal  cycle.  The 
glottal  cycle  is  defined  by  an  opening  phase,  during  which  the  glottal 
area  increases;  a  closing  phase,  where  the  glottal  area  is  decreasing; 
and  a  closed  phase,  during  which  the  glottis  is  closed  [37].  In  the 
event  that  no  closure  is  present,  the  fundamental  frequency  is  the 
inverse  of  the  time  span  between  two  similar  events  in  the  glottal  area. 

Excessive  variability  of  the  fundamental  frequency  is  an  indication 
of  a  possible  pathology. 

Opening  phase,  closing  phase,  and  closed  phase.  As  mentioned 
earlier,  the  glottal  duty  cycle  is  divided  into  three  regions:  opening 
phase,  closing  phase,  and  closed  phase.  The  open  phase  is  defined  as 
the  time  the  glottis  has  non-zero  area,  and  in  terms  of  the  definition 
it  is  usually  equal  to  the  combined  time  of  opening  phase  and  closing 
phase.  When  there  is  no  closure,  the  open  phase  is  equal  to  the  entire 
cycle  time.  Again  the  absence  of  closure  in  some  types  of  phonation 
indicates  the  possible  presence  of  pathology. 

Open  quotient  (OQ),  speed  quotient  (SQ),  and  speed  index  (SI). 
Timcke,  Von  Lenden,  and  Moore  [38]  defined  two  parameters  to  represent 
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the  vibration  of  the  vocal  folds,  the  open  quotient,  and  the  speed 
quotient.  The  two  parameters  are  defined  as 

duration  of  the  open  phase 
0Q  = 


duration  of  the  entire  glottal  cycle 
duration  of  the  opening  phase 


SQ  = 

duration  of  the  closing  phase 

The  speed  index  is  defined  in  terms  of  the  speed  quotient  as 

SQ  -  1 


SI 


or       SI  = 


SQ  +  1 

duration  of  opening  phase  -  duration  of  closing  phase 

duration  of  opening  phase  +  duration  of  closing  phase 


The  speed  quotient  varies  directly  with  the  intensity  of  the  sound 
produced  while  the  open  quotient  varies  inversely  with  the  sound 
intensity.  Hildebrand  [39]  conducted  an  intensive  study  of  the  effects 
of  intensity  and  frequency  on  the  open  and  speed  quotients  in  normal 
subjects.  Her  results  showed  that  the  0Q  increased  significantly  on 
transition  from  low  to  medium  frequency,  while  it  remained  constant  or 
decreased  slightly  at  higher  frequencies.  Increasing  the  intensity  from 
low  to  medium  levels  had  little  influence  on  the  0Q,  but  increasing 
intensity  from  medium  to  high  level  decreased  the  0Q  significantly. 

The  SQ,  on  the  other  hand,  varied  inversely  with  frequency.  For 
frequency  change  from  low  to  medium  the  SQ  decreased  significantly, 
while  it  remained  unchanged  for  frequency  transition  from  medium  to  high 
level.  The  SQ  had  a  nonsignificant  increase  with  increasing  intensity. 
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The  speed  index  as  defined  is  related  to  the  SQ.  However,  unlike 
the  SQ,  which  varies  from  0  to  »,  it  has  a  range  of  -1  to  1.  Also  for 
two  waveforms  that  have  the  same  basic  shape  except  that  one  is  the 
reverse  of  the  other,  the  speed  index  will  have  the  same  absolute  value 
while  the  speed  quotient  gives  two  different  values  that  have  a  product 
of  1.  It  is  readily  seen  that  the  SI  has  a  simpler  waveform  than  the  SQ 
and  can  easily  be  visualized. 

Contact  area.  This  is  the  lateral  area  of  contact  between  the 
folds  while  touching  each  other.  During  the  closed  phase  this  area 
changes,  however,  this  change  cannot  be  viewed  from  the  high  speed 
films,  but  the  electroglottographic  waveform  seems  to  give  a  good 
indication  of  this  change. 

Other  parameters  such  as  amplitude,  mucosal  wave,  homogeniety, 
regularity  or  periodicity  of  successive  vibrations,  and  symmetry  of  the 
folds  are  important.  Moore  and  Thompson  [37]  considered  the  symmetry 
and  equal  amplitude  of  the  vocal  fold  vibration  as  one  of  the  necessary 
conditions  for  the  production  of  normal  phonations. 

2.2.3  Acoustic  Analysis 

Traditionally,  laryngologists,  phoniatricians,  and  speech 
pathologists  have  relied  on  two  basic  techniques:  listening  to  the  voice 
and  viewing  the  larynx  with  the  aid  of  a  mirror  or  laryngoscope. 
Changes  in  voice  quality  often  result  from  laryngeal  pathology,  and 
experienced  laryngologists  are  able,  to  some  extent,  to  detect  some 
pathologies  by  administering  listening  tests.  However,  this  is  a 
subjective  procedure  and  different  laryngologists  often  give  different 
diagnoses  for  the  same  patient  [40]. 
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Acoustic  analysis  is  currently  gaining  popularity.  The  noninvasive 
nature  of  this  technique  is  its  greatest  asset.  It  lends  itself  to 
screening  of  a  large  population  for  early  detection  of  voice  pathology 
and  for  following  the  effects  of  voice  therapy.  It  does  not  require 
close  cooperation  of  the  subject  and  can  be  made  offline  from  tape 
recordings.  The  method  tries  to  provide  objective  and  quantitative 
measure  of  the  voice.  The  following  discussion  lists  the  parameters 
used  for  pathology  detection  and  evaluation  of  certain  aspects  of  the 
acoustical  signal  of  the  voice. 

2.2.3.1  Fundamental  frequency  related  measures 

Lieberman  [41,42]  showed  that  the  average  pitch  period  perturbation 
is  larger  for  pathologic  speakers  than  in  normal  speakers.  He  proposed 
two  measures: 

(a)  Pitch  perturbation  (PP) 

This  parameter  is  defined  as  the  time  difference  between 
duration  of  successive  pitch  periods  in  the  speech  signal. 
So 

PP  =  AP  -  P.   -  Pi+1 

(b)  Pitch  purturbation  factor  (PPF) 

This  parameter  is  the  relative  frequency  of  pitch  period 
perturbation  larger  than  0.5  msec  occurring  in  a  steady  vowel 
sound. 


#  of  aP  >  .5  msec 

PPF  =  

total  #  of  aP 
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(c)  Relative  average  perturbation 

Koike  [43]  observed  that  normal  subjects  phonating  a 
steady  vowel  sound  exhibit  a  slow  and  relatively  smooth 
change  in  the  pitch  period.  He  defined  a  measure  called  the 
relative  average  perturbation  (RAP),  where 


N 
RAP  =  - 


1_N.E1  |  P(1-l)  +  P(1)»P(1+l).P(i)  | 


1  N 

n  1=1 


P(i),  i  =  1,  2 N,  denotes  the  successive  pitch  periods. 

Koike  [44]  used  a  contact  throat  microphone  to  record  a  signal 
related  to  the  folds'  vibration.  He  proposed  another  measure  called  the 
correlogram.  This  measure  is  based  on  the  quasi -periodic  amplitude 
modulation  observed  in  the  steady  vowel  sounds  of  pathological 
speakers.  He  found  that  the  correlograms  of  pathological  and  normal 
speakers  are  generally  distinguishable  from  one  another. 

2.2.3.2  Spectral  measures 

Several  investigators  attempted  to  relate  the  spectral  content  of 
speech  to  certain  pathologies  in  voice  production.  Yanagihara  [45,46] 
proposed  four  types  of  classifications  of  the  hoarse  voice.  Type  I  is 
when  the  regular  harmonic  components  are  mixed  with  the  noise  components 
primarily  in  the  formant  region  of  the  vowels.  Type  II  is  when  the 
second  formant  of  /e/  and  /i/  are  dominated  by  the  noise  components  with 
additional  noise  in  the  region  above  3000  Hz.  Type  III  is  when  the 
second  formant  of  /e/  and  /i/  are  totally  replaced  by  the  noise 
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component  and  the  high  frequency  noise  in  the  region  above  3000  Hz  are 
further  intensified.  Type  IV  is  when  the  second  formants  of  /a/,  /e/, 
and  /i/  are  replaced  by  noise  components. 

Another  method  is  the  long-time-average  spectra  (LTAS)  used  by 
Frokjaer-Jensen  and  Prytz  [47].  This  method  utilizes  a  400  channel, 
real  time,  narrow  band  analyzer  to  compute  the  spectra  of  speech 
averaged  over  45  sees.  Several  parameters  of  the  resulting  spectra  are 
examined.  These  parameters  include  the  fundamental  frequency  F0,  the 
first  formant  (Fl),  Fmin  the  frequency  of  the  minimum  spectra  level 
between  F0  and  Fl,  Fmax  and  Lmax  the  level  of  the  maximum  peak  in  the 
entire  spectrum,  the  quotient  of  F0  and  Fl,  the  harmonic  richness 
defined  as  the  energy  below  1  Khz  divided  by  the  energy  above  1  Khz,  and 
other  parameters.  Recently,  Kitzing  [48]  conducted  experiments  using 
the  LTAS  method  to  investigate  qualities  of  disturbed  voice  due  to 
aberrant  functioning  in  organically  healthy  vocal  organs.  In  these 
experiments  ten  experienced  voice  therapists  each  produced  four 
different  recognizable  voice  qualities.  He  concluded  that  the  most 
important  measures  were  the  harmonic  richness,  the  spectral  slope 
inclination  in  the  first  formant  region,  and  the  ratio  between  peak 
level  of  the  fundamental  and  first  formant  region. 

2.2.3.3  Multidimensional  approach 

So  far  the  discussion  has  centered  on  examining  single  acoustic 
parameters  as  to  their  potential  for  detecting  voice  disorders.  Some 
researchers  reached  the  conclusion  that  a  more  robust  method  would  be  to 
use  multidimensional  analysis  using  multiple  acoustic  parameters. 
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Working  from  this  premise  Davis  [49,50]  constructed  a  voice  profile 
using  six  parameters  (PPQ,  APQ,  EX,  PA,  SFF,  and  SFR),  where  the  pitch 
perturbation  quotient  (PPQ)  is  similar  to  the  RAP  measure  by  Koike, 
except  the  averaging  window  is  taken  over  five  points  instead  of 
three.  The  amplitude  perturbation  quotient  (APQ)  is  also  defined  as  the 
RAP,  but  the  averaging  is  done  using  the  amplitude  values  of  each  pitch 
rather  than  the  period.  The  EX  is  the  coefficient  of  excess  defined  as 


EX  = 


E[(X  -  X)4]"3 
E[(X  -  X)2]2 


This  excess  coefficient  is  measured  from  the  residue  signal  after 
inverse  filtering  the  speech  signal.  Although  this  signal  does  not  have 
a  physical  correlate,  it  is  found  that  for  pathologic  subjects  the 
excess  coefficient  is  higher  than  for  normal  subjects. 

The  pitch  amplitude  (PA)  is  the  pitch  period  peak  in  the  residue 
signal  autocorrelation  function.  The  PA  is  high  for  voiced  sounds  from 
normal  subjects,  since  the  PA  is  a  measure  of  voicing.  However,  for 
breathy  voiced  sounds  associated  with  some  pathological  speakers,  the  PA 
is  low  indicating  weak  periodicity  due  to  abnormal  fold's  vibration  and 
hence  the  presence  of  noise  that  is  heard. 

The  spectral  flatness  of  the  residue  inverse  filter  (SFF)  is 
defined  as  the  ratio  in  decibels  (dB)  of  the  geometric  mean  of  the 
spectrum  to  the  arithmetic  mean  of  the  spectrum.  The  more  noise  the 
signal  contains,  the  greater  is  its  spectral  flatness.  So  for  unvoiced 
sounds,  where  the  glottis  is  open,  the  spectral  flatness  is  large.  For 
voiced  sounds  the  spectral  flatness  is  not  as  large.  For  voiced  sounds 


26 

of  pathological  speakers,  the  spectral  flatness  is  expected  to  be 
greater  than  that  of  normal  subjects,  indicating  improper  closure  of  the 
folds. 

The  spectral  flatness  of  the  residue  signal  (SFR)  is  similar  to  the 
SFF.  The  voiced  sound  from  normal  subjects  is  expected  to  have  the  FO 
harmonics  in  the  spectrum  of  the  residue  signal.  When  the  voiced  sound 
becomes  more  noiselike,  as  in  the  case  of  pathologic  speakers,  the  SFR 
does  not  have  the  expected  FO  harmonics.  The  results  of  analysis  of  the 
pathologic  voice  agrees  with  this  observation.  Davis  reported  a 
detection  probability  of  95.2%  in  a  closed  test  and  67.4%  in  an  open 
test. 

Other  researchers  [51,52]  derived  fourteen  acoustic  measures  from 
the  glottal  sound  waveform  or  volume  velocity  found  by  inverse  filtering 
the  speech  signal.  They  related  these  acoustic  parameters  to  other 
phonatory  function  factors,  such  as  vibratory  pattern  of  the  vocal 
folds,  physical  properties  of  the  folds,  aerodynamic  measures,  and 
psycho-acoustic  parameters  of  the  voice.  Their  system  was  successful 
70-80%  of  the  time  in  separating  normal  subjects  from  subjects  with 
pathological  disorders. 

2.2.4  Psycho-Acoustic  Evaluation 

As  mentioned  earlier,  speech  pathologists  and  otolaryngologists  are 
frequently  able  to  determine  vocal  fold  pathologies  by  listening  to  the 
voice.  This  is  called  psycho-acoustic  evaluation. 

The  psycho-acoustic  parameters  are  the  voice  pitch,  loudness, 
laryngeal  quality,  and  resonance  [53].  A  voice  is  judged  abnormal  when 
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any  of  these  parameters  deviate  from  the  expected  range  of  persons 
having  the  same  age,  sex,  and  cultural  background. 

The  pitch  is  the  perceptual  correlate  of  the  frequency  of  the 
folds'  vibration,  also  referred  to  as  the  fundamental  frequency.  It  is 
judged  atypical  or  defective  if  it  is  too  high,  too  low,  monotonous,  or 
tremulous. 

Loudness  is  the  sensation  related  to  the  amplitude  of  the  molecular 
motion  in  the  sound  wave.  It  is  judged  abnormal  when  a  voice  is  too 
loud  or  too  quiet  in  relation  to  a  specific  environmental  situation  or 
when  the  loudness  variation  is  inappropriate  to  the  meaning  of  the 
utterance. 

Quality  of  the  voice  is  not  as  easily  defined.  Moore  [53]  used  an 

excellent  example  for  explaining  this  parameter: 

A  listener  can  identify  the  tones  of  a  trombone,  a  saxophone, 
and  a  violin  even  when  all  three  are  producing  sounds  at  the 
same  pitch  and  equal  loudness. 

The  quality  of  voice  is  thought  to  be  related  to  the  weighting  of  the 

spectral  components  of  the  speech  signal.   Based  on  this  assumption, 

Yanagihara  [45]  developed  his  criteria  for  classifying  the  hoarse  voice 

as  mentioned  in  the  previous  section. 

Voice  quality  disorders   encompass   a  wide   range  of   voice 

disorders.   A  voice  with  quality  disorder  can  be  breathy,  rough  or 

harsh,  hoarse,  husky,  throaty,  metallic,  hypernasal ,  and  denasal ,  among 

other  terms  used  to  describe  the  quality  of  the  voice.  The  descriptive 

nature  of  these  terms  lends  themselves  to  different  interpretation  by 

different  voice  pathologists.   Many  researchers  around  the  world  are 

currently  working  to  provide  a  universally  accepted  method  for  defining 
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these  terms,  and  a  standard  procedure  to  measure  the  extent  of  these 
disorders  using  a  universally  accepted  scale. 

Ishiki  [54],  working  towards  this  goal,  used  a  semantic 
differential  technique  for  factor  analysis  of  hoarseness,  using 
seventeen  polar-opposite  adjectives  as  the  scales.  Three  pairs 
represented  an  evaluation  factor,  another  three  pairs  represented  a 
potency  factor,  an  activity  factor  was  represented  by  three  other  pairs, 
and  eight  pairs  were  used  to  describe  hoarseness.  Based  on  the  analysis 
of  sixteen  recorded  samples  of  speech,  he  concluded  that  hoarse  voice 
consisted  of  at  least  four  factors.  The  first  factor,  and  most  dominant 
of  the  four,  is  related  to  the  roughness  (R),  rumbling,  or  rattling 
quality.  The  second  factor  is  related  to  the  breathiness  (B)  quality. 
The  third  factor  is  related  to  the  quality  asthenic  (A),  and  the  final 
factor  is  related  to  the  degree  (D)  of  hoarseness.  Ishiki  also 
associated  a  four-point  grading  scale  with  these  factors.  The  scale 
ranged  from  "0"  for  normal,  "1"  for  slight,  "2"  for  fair,  and  "3"  for 
extreme. 

The  psycho-acoustic  evaluation  procedure  suffers  from  non- 
standardization  of  the  terms  used  to  describe  the  pathologic  voice.  The 
classification  of  the  voice  as  hoarse,  breathy,  harsh,  rough,  etc.  has 
different  meaning  to  different  voice  pathologists.  The  definition  of 
the  descriptive  terms  used  for  diagnosis  are  not  standardized.  In  an 
attempt  to  overcome  this  problem,  the  Committee  for  Phonatory  Function 
Tests  of  the  Japan  Society  of  Logopedics  and  Phoniatn'cs  (CPFTJSLP) 
proposed  a  system  called  "GRBAS."  This  system  is  used  mainly  to 
evaluate  hoarseness.  It  consists  of  five  scales:  Grade  (G),  Rough  (R), 
Breathy  (B),  Asthenic  (A),  and  Strained  (S). 
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The  grade  (G)  represents  the  degree  of  hoarseness  of  the  voice 
abnormality.  Scale  "R"  represents  a  psycho-acoustic  impression  of  the 
irregularity  of  vocal  fold  vibrations.  It  also  corresponds  to  the 
fluctuations  in  the  fundamental  frequency  and  amplitude  of  the  glottal 
source  sound.  Scale  "B"  is  related  to  the  psycho-acoustic  impression  of 
the  extent  of  air  leakage  through  the  glottis.  Scale  "A"  is  related  to 
weakness  or  lack  of  power  in  the  voice.  Scale  "S"  is  related  to  the 
impression  of  the  hyperfunctional  state  of  phonation.  It  is  usually 
related  to  an  abnormally  high  pitch  frequency,  noise  in  the  high 
frequency  range,  and  richness  in  high  frequency  harmonics. 

Using  the  "GRBAS"  system  the  hoarse  voice  can  be  evaluated  by  a 
four-point  grading  for  each  scale:  "0"  for  normal,  "1"  for  voice  with 
slight  hoarseness,  "2"  for  voice  with  moderate  hoarseness,  and  "3"  for 
the  voice  with  extreme  hoarseness. 

The  GRBAS  system  is  still  subjective.  The  CPFTJSLP  provides  a 
standard  tape  which  has  typical  voice  samples  of  the  above  quality 
defects  represented  by  the  GRBAS  scale  to  provide  a  certain  measure  of 
objectivity  among  specialists  using  this  system.  The  users  of  this 
system  are  expected  to  possess  highly  trained  ears. 


CHAPTER  3 
ELECTROGLOTTOGRAPH  WAVEFORM  MODELING 


3.1  Introduction 


The  EGG  signal  is  the  output  of  a  device  called  the 
electroglottograph.  The  device  operates  in  the  following  fashion.  Two 
electrodes  are  placed  on  either  side  of  the  thyroid  cartilage.  One 
electrode  acts  as  a  transmitter  and  the  other  one  as  a  receiver.  A  high 
frequency  RF  current  (300  Khz  -  5  Mhz)  is  applied  to  one  electrode. 
This  RF  current  is  transmitted  across  the  larynx  and  is  detected  by  the 
other  electrode  (receiver).  The  impedance  change  resulting  from 
vibrations  of  the  vocal  folds  modulates  the  amplitude  of  the  RF 
current.  The  amount  of  modulation  is  directly  related  to  the  amount  of 
impedance  change  across  the  larynx.  On  the  receiver  side,  the  detected 
signal  is  demodulated  to  produce  the  EGG  signal. 

The  change  in  impedance  across  the  larynx  is  primarily  due  to  the 
change  in  the  lateral  contact  area  of  the  vocal  folds  [10].  Hence,  the 
EGG  is  a  measure  of  the  amount  of  vocal  fold  contact  area  and  not  of  the 
area  of  the  glottis,  i.e.,  the  EGG  signal  is  not  directly  related  to  the 
glottal  area  function.  Figure  3.1  shows  a  functional  block  diagram  of 
the  electroglottograph  with  a  typical  EGG  signal. 
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Figure  3.1  Functional  block  diagram  of  the  Electroglottograph 
and  its  output  signal  the  EGG. 
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3.2  Previous  Models 

The  previous  section  described  the  electroglottographic  (EGG) 
signal,  or  electroglottogram  (EGG),  and  its  relationship  to  vocal  fold 
vibration.  In  summary,  Fourcin  [55]  upgraded  the  original  Fabre  device 
to  its  current  state.  Using  stroboscopic  photography  synchronized  with 
the  EGG  waveform,  he  concluded  that  the  EGG  amplitude  was  related  to  the 
amount  of  contact  between  the  two  folds  during  the  phonation  of  a  voiced 
sound. 

Fant  et  al .  [56]  combined  the  EGG  with  optical  glottography  and 
inverse  filtering  of  the  speech  waveform.  Their  results  can  be 
summarized  as  four  points:  the  opening  instant  of  folds  is  often 
associated  with  a  sudden  change  in  the  slope  of  the  opening  phase  of  the 
EGG,  the  "outstanding  feature"  is  the  rapid  fall  of  the  EGG  when  the 
folds  reach  contact  in  the  closing  phase,  the  flat  top  of  the  EGG 
corresponds  to  the  open  phase  of  the  glottal  cycle,  and  the  ascendiny 
portion  of  the  EGG  corresponds  to  the  opening  phase  of  the  glottal 
cycle. 

In  order  for  electroglottography  to  be  useful  to  the  clinician  or 
speech  researcher  the  electroglottographic  waveform  must  be  related  to  a 
model  of  vocal  fold  vibratory  behavior.  Fog-Pedersen  [57]  carried  out  a 
study  similar  to  the  one  done  by  Fourcin,  where  stroboscopic 
observations  of  the  fold  vibrations  are  synchronized  with  the  EGG 
signal.  Based  on  these  observations,  he  constructed  a  model  for  the  EGG 
during  a  single  cycle,  as  shown  in  Figure  3.2. 

Lecluse  [58]  also  combined  electroglottography  synchronized  with 
stroboscopic  photography  and  constructed  a  model  similar  to  the  Fog- 
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EGG 


1  Maximum  opening  phase 

2  Maximum  closing  phase 
Points  3  and  4  are  changes 

from  the  plateau  to  the 
glottal  slope  of  the 
glottographic  curves. 


Figure  3.2  Fog-Pedersen's  model  for  the  EGG, 
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Pedersen  model.  However,  he  included  details  that  related  specific 
points  in  the  EGG  waveform  to  be  different  events  in  the  vibratory  cycle 
of  the  vocal  folds.  His  model  is  presented  in  Figure  3.3. 

Rothenberg  [59]  correlated  the  EGG  with  the  glottal  volume  velocity 
waveform  derived  by  analog  inverse  filtering  the  speech  signal.  He  used 
an  idealized  model  of  the  EGG  signal,  shown  in  Figure  3.4,  to  describe 
the  relationship  between  the  vocal  fold  vibration  and  different  features 
in  the  EGG  waveform. 

Krishnamurthy  [11]  used  a  large  data  base  of  synchronized  EGG, 
glottal  area  function,  and  glottal  volume  velocity  to  modify  the 
Rothenberg  model.  He  used  the  differentiated  EGG  waveform  to  pinpoint 
the  opening  and  closing  instants  in  the  glottal  cycle.  This  modified 
model  is  presented  in  Figure  3.5. 

Childers  et  al .  [32-34,60,61]  used  their  extensive  data  base  of 
ultra-high  speed  laryngeal  films  synchronized  with  EGG  and  speech 
waveforms,  to  further  modify  the  Rothenberg  model.  The  resulting  model 
is  depicted  in  Figure  3.6. 

Other  researchers  [62-64]  carried  out  various  experiments  using  the 
EGG  synchronized  with  stroboscopic  photography  and/or  photoglottography 
and  the  air  flow  signal.  Their  results  support  the  notion  that  the  EGG 
signal  is  inversely  related  to  the  contact  area  of  the  vocal  folds. 

The  above  models  were  based  on  extensive  observations  of  the  EGG, 
synchronized  with  stroboscopic  photography,  photoglottography,  and 
ultra-high  speed  cinematography  along  with  the  glottal  volume  velocity 
waveform.  These  models  are  descriptive  in  nature  and  do  not  give 
quantitative  measures  of  different  parameters  of  the  vocal  fold 
vibrations. 
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EGG 


1  is  the  moment  of  initial  closure  at  a  single  point 

2  is  the  moment  at  which  closure  is  completed  over  the  whole 

length,  but  not  in  the  vertical  plane 

3  is  the  moment  at  which  closure  is  compeleted  over  the  whole 

vertical  plane 

4  is  the  moment  at  which  opening  begins 

5  is  the  moment  at  which  time  whole  length  Is  open 

Figure  3.3  Led  use's  model  for  the  EGG. 
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EGG 


1-2  vocal  folds  maximally  closed 

3-4  folds  separating  from  lower  margins 
towards  upper  margins 

4-5  upper  fold  margins  separating 

7   lower  margins  close 
3-7  folds  apart 

1  closure  reaches  upper  fold  margins 


Figure  3.4  Rothenberg's  model  for  the  EGG. 
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EGG 


Diff. 
EGG 

0  — 


A-B  glottis  is  open,  folds  are  moving  away  from  each  other 

B-C  glottis  is  open,  folds  are  moving  towards  each  other 

C   folds  make  initial  contact 

C-D  folds  close  rapidly 

D   time  of  maximum  negative  value  in  Diff.  EGG 

D-F  closed  phase  region 

E   folds  reach  maximal  lateral  area  of  contact 


Figure  3.5  Krishnamurthy's  qualitative  EGG  model 
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ELEMENTARY  EGG  MODEL 
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may  not  be  obtained.  Flat  portion  idealized. 

Folds  parting,  usually  from  lower  margins  toward 

upper  margins. 

When  this  break  point  is  present,  this  usually 

corresponds  to  folds  opening  along  upper  margin. 

Upper  fold  margins  continue  to  open. 

Folds  apart,  no  lateral  contact.  Idealized. 

Open  phase. 

Glottal  area  zero.  Folds  in  contact  along  lower 

margin.  Idealized. 

Folds  closing  from  lower  to  upper  margin. 

Rapid  increase  in  vocal  fold  contact. 


Figure  3.6  Modified  Rothenberg  EGG  Model. 
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Titze  et  al .  [65]  and  Titze  [66]  presented  a  mathematical  model  for 
computing  various  glottographic  waveforms.  His  model  is  still  in  the 
development  stage. 

In  the  following  section  we  present  a  different  mathematical 
development  that  generates  and  models  the  EGG  waveform  directly  from  the 
contact  area  of  the  vocal  folds.  The  calculated  EGG  corresponds 
accurately  with  the  area  function  of  the  vibrating  folds.  Also,  we  show 
illustrative  results  for  predicting  EGG  waveforms  for  models  of 
vibrating  vocal  folds  that  have  a  nodule  or  polyp  on  one  fold. 

3.3  Simple  Unitary  Mass  Model  of  the  Vocal  Folds 

Our  simple  model  is  an  extension  of  the  two-mass  vocal  fold 
articulatory  speech  synthesis  model  of  Ishizaka  and  Flanagan  [67], 
Flanagan  and  Ishizaka  [68-70],  and  Flanagan,  Ishizaka,  and  Shipley 
[71].   The  Flanagan  and  Ishizaka  two-mass  model  provides  sufficient 
details  to  calculate  the  speech  waveform,  the  pressure  and  air  volume- 
velocity  distributions  in  the  vocal  cavities,  the  motion  of  the  vocal 
folds,  the  glottal  area,  the  vibration  of  the  cavity  walls,  and  other 
factors.  A  failing  of  this  two-mass  model  is  that  the  projected  glottal 
area  is  always  rectangular.  The  folds  are  either  open  or  closed;  there 
is  no  gradation  of  opening  or  closure.   The  lateral  contact  area  is 
stair-stepped,  being  zero  (open  glottis),  next  partial  contact  (lower 
masses  in  contact),  and  finally  full  contact  (lower  and  upper  masses 
both  in  contact).  This  approximation  does  not  adequately  replicate  the 
vibratory  motion  of  true  vocal  folds. 
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A  more  realistic  simulation  of  the  vocal  folds  is  achieved  with  a 
unitary  mass  model  as  depicted  in  Figure  3.7.  Here  the  mass  of  each 
fold  is  thought  of  as  a  plastic  wedge.  The  horizontal  displacements,  do 
and  dj  of  the  superior  (upper)  and  inferior  (lower)  margins  (edges) 
respectively  of  the  wedge  which  simulates  the  vocal  folds,  are 
determined  from  the  two-mass  model.  The  wedge  is  constructed  by  using  a 
straight  line  interpolation  between  points  d-j^  and  d2.  The  thickness  of 
the  simulated  folds  is  T  and  their  length  is  L.  The  ratio  of  L  to  T  can 
be  specified.  In  the  examples  to  follow  we  have  set  L/T  =  5.  The  model 
assumes  a  plastic  collision  between  the  right  and  left  vocal  folds.  The 
length  of  vertical  contact,  Ax,  can  then  be  easily  computed  and  the 
contact  area  may  be  estimated  at  a  particular  time  instant  as 

A  =  (L)(Ax)  (3.1) 

The  EGG  waveform  is  proportional  to  the  reciprocal  of  A.  Figure  3.8a 
shows  the  EGG  signal,  its  derivative,  and  the  calculated  glottal  area, 
all  estimated  with  the  aid  of  this  model.  Compare  these  results  with 
Figure  3.8b,  which  shows  a  measured  EGG  and  glottal  area.  The 
differentiated  EGG  (DEGG)  will  be  discussed  further  in  another  section. 

This  simple  model  of  the  EGG  signal  is  a  good  first  approximation 
relating  glottal  opening  and  glottal  closure  to  EGG  events,  as  depicted 
earlier  in  Figure  3.5.  However,  it  does  not  account  for  the  different 
angles  of  glottal  closure  from  anterior  to  posterior  and  of  glottal 
opening  from  posterior  to  anterior.  Note  also  that  this  model  is  a 
simplified  method  for  calculating  the  lateral  area  of  contact  of  the 
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Figure  3.7  Triangular  unitary  mass  vocal  fold  model,  top 
and  lateral  views. 
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Figure  3.8  (a)  EGG,  differentiated  EGG  (DEGG),  and  the  glottal 
area  calculated  from  the  simple  unitary  mass  vocal  fold  model, 
(b)  measured  glottal  area  and  EGG  waveforms. 
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vocal  folds.  We  do  not  account  for  conservation  of  momentum,  nor  do  we 
allow  the  folds  to  expand  as  they  compress  and  thereby  possibly  increase 
the  lateral  contact  area  by  pushing  the  vocal  fold  tissue  both 
superiorly  and  inferiorly  along  the  mid-saggital  contact  plane.  The 
model  is  a  simple  additional  calculation,  appended  to  the  articulatory 
speech  synthesis  model,  which  estimates,  to  a  first  approximation,  the 
lateral  contact  area.  As  can  be  seen  from  Figure  3.8,  this  simple  model 
works  reasonably  well. 

3.4  Triangular  Unitary  Mass  Model  of  the  Vocal  Folds 

High  speed  films  of  the  vocal  folds  of  males  in  modal  register  show 
that  there  exists  a  phase  difference  along  the  length  of  the  vocal  folds 
during  their  vibration  and  therefore  during  the  closing  (opening) 
phase.  During  closure,  contact  between  the  folds  first  occurs  over  a 
small  portion  of  their  length.  Closure  continues,  zipper-like,  along 
the  length  of  the  folds  until  the  glottis  is  closed.  Similar  behavior 
occurs  during  the  opening  phase.  The  angles  of  vocal  fold  closure  and 
opening  differ  from  one  another. 

We  can  model  this  behavior  as  in  Figure  3.9  by  creating  an  angle, 
9,  between  the  left  and  right  vocal  folds.  The  contact  area  is  now 
proportional  to  Al .  This  is  similar  to  the  approach  taken  by  Titze 
[66]. 

In  addition  to  the  longitudinal  vocal  fold  phase  differences 
described  above,  a  vertical  (superior-inferior)  phase  difference  between 
the  upper  and  lower  margins  of  the  vocal  folds  has  also  been  observed. 
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An  artistic  rendition  of  the  elastic,  flexible  unitary  mass  vocal 
fold  model  and  its  relation  to  EGG  waveform  events  is  depicted  in  Figure 
3.10.  Note  the  vertical  phase  difference  between  the  edges  of  each  mass 
as  well  as  the  longitudinal  phase  difference.  The  latter  phenomenon 
resembles  the  action  of  a  zipper  being  closed  and  opened. 

The  manner  in  which  our  model  functions  is  similar  to  that  shown  in 
Figure  3.10  for  the  flexible  one  mass  model.  This  model  works  as 
follows.  The  upper  and  lower  glottal  areas  (AG2  and  AG1,  respectively) 
are  calculated  using  the  Flanagan-Ishizaka  two-mass  vocal  fold 
articulator  synthesis  model.  The  displacements  of  these  parallel 
masses  of  the  vocal  folds  (both  upper  and  lower)  from  the  mid-sagittal 
line  is  calculated  for  a  particular  time  instant,  n,  as 

h  (n\       AGl(n) 

dj(n)  =-2^  (3.2) 

a    (n\       AG2(n) 

d2(n)=-2L  (3.3) 

where  n  is  a  time  index  and  L  is  the  length  of  the  vocal  folds  (see 
Figure  3.7).  These  displacement  values  are  used  to  position  the  upper 
and  lower  margins  at  the  posterior  ends  of  the  vocal  folds  in  the  model 
in  Figure  3.9.  This  modified  model  has  a  triangular  glottal  area  with 
an  angle  9  as  shown.  The  phase  difference  between  dj(n)  and  d2(n)  is 
maintained  along  the  complete  length  of  the  unitary  model  of  the  vocal 
folds. 

The  vocal  fold  contact  area  is  calculated  using  this  triangular, 
phase  shifted  configuration.  Several  conditions  may  be  specified  in  the 
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Figure  3.10    The  flexible  one  mass  model 
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computer  program  implementation  of  this  model:  the  folds  are  not  in 
contact  (no  contact  area),  the  lower  margins  of  the  folds  are  in  contact 
and  the  upper  margins  of  the  folds  are  not  in  contact  (lateral  area  of 
contact  is  triangular),  lower  margins  are  not  in  contact  and  the  upper 
margins  are  in  contact  (lateral  area  of  contact  is  again  triangular), 
and  both  upper  and  lower  vocal  fold  margins  are  in  contact  and  possibly 
out  of  phase  (lateral  area  of  contact  is  trapezoidal,  the  condition 
shown  in  Figure  3.9). 

With  these  conditions  the  EGG  waveform  is  specified  as 


EGG(n)  =  A(n)  +  C  (3.4) 


where  n  is  the  time  index,  A(n)  is  the  lateral  contact  area,  C  is  a 
constant  proportional  to  the  shunt  impedance  specified  for  the  case  when 
A(n)  =  0,  and  k  is  a  scaling  constant.  The  glottal  area  is  calculated 
in  the  model  using  the  projected  triangular  glottal  area  configuration, 
not  the  projected  area  given  by  AG1  and  AG2  of  the  two-mass  vocal  fold 
articulator  speech  synthesis  model. 

Algorithm  EGG  Simulation 

(1)  Using  the  Flanagan-Ishizaka  two-mass  vocal  fold 
articulator  speech  synthesis  model,  obtain  AG1  and  AG2 
(defined  earlier). 

(2)  Specify  opening  and  closing  angles,  eQ  and  9C  ,  respec- 
tively. 

(3)  Specify  phase  shift  between  AG1  and  AG2. 

(4)  Let  N  be  the  number  of  area  samples  computed  in  step  1. 
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(5)  Do  for  n  =  1,  N. 

(a)  Compute  dl(n)  and  d2(n)  using  equation  2  and  3. 

(b)  If  AGl(n)  <  AGl(n+l)  or  AG2(n)  <  AG2(n+l)  set 
ANG  =  eQ  ;  folds  are  opening.  Else  set  ANG  =  er  ; 

folds  are  closing. 

(c)  Recompute  glottal  area  based  on  current  folds 
configuration. 

(d)  Compute  length  of  upper  and  lower  contact  margins. 

(e)  Compute  the  depth  of  contact  area. 

(f)  Compute  contact  area. 

(g)  Compute  EGG  using  equation  4.  K  and  C  are  arbitrarily 
set  to  1  and  10,  respectively. 

(h)  Compute  the  differential  EGG(DEGG)  using  the  filter 
H(z)  =  1  -  z"  .  If  done  go  to  step  6  else  go  to  step 
5. 

(6)  Plot  EGG,  DEGG,  glottal  area. 

(7)  End. 
The  EGG  model  described  above  is  summarized  in  Figures  3.11  and 

3.5  Simulation  Results 

To  help  orient  the  reader  the  first  example  is  illustrated  in 
Figure  3.13  for  the  following  conditions:  opening  angle,  e  =  1.0°  ■ 
closing  angle,  ec  =  0.2°  ;  and  a  lag  of  0.8  ms  between  the  upper  and 
lower  vocal  fold  margins.  These  values  have  been  found  to  simulate 
features  of  an  actual  EGG  quite  well.   In  subsequent  subsections  we 
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FLOW  CHART 
EGG   MODEL  CALCULATIONS 


OPENING 

EZZ 


SET 
ANGLE   -  90 


(start) 


SPECIFY 

•  AG1 

•  AG2 

•  ©O 
.  9C 


SPECIFY  PHASE   SHIFT 
BETWEEN   AG1  &  AG2 


COMPUTE   DISPLACEMENT 
(D1    &  D2  )  OF  FOLDS 


COMPUTE  GLOTTAL  AREA 


CLOSING 


SET 
ANGLE   -  9C 


COMPUTE  LATERAL  CONTACT   AREA 


COMPUTE   EGG    &  DEGG 


(done) 


Figure  3.12  Flow  chart  for  EGG  model  calculations 
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demonstrate  the  effects  on  the  EGG  waveform  of  varying  one  parameter  at 
a  time.  The  lag  can  be  specified  in  the  model  as  a  fraction  of  the 
fundamental  period  of  voicing.  Here  we  represent  this  lag  in  units  of 
ms.  The  maximum  and  minimum  displacements  of  both  the  upper  and  lower 
margins  are  the  same.  The  collision  of  the  unitary  masses  is  assumed  to 
be  plastic,  perhaps  like  the  collision  of  two  wedges  of  putty.  There  is 
a  strong  resemblance  between  this  simulated  EGG  waveform  and  the 
measured  EGG  waveform  in  Figure  3.8b  except  the  model  waveform  does  not 
have  the  rounded  corners  of  the  measured  waveform. 

We  now  use  this  model  to  show  the  effects  of  the  following: 
varying  the  opening  angle,  varying  the  closing  angle,  varying  the  phase 
difference  between  the  upper  and  lower  margins  of  the  folds,  a  mucus 
strand  bridging  the  vocal  folds  during  the  opening  phase,  and  vocal  fold 
polyps  or  nodules  on  the  EGG  model  waveform. 

3.5.1  Varying  the  Opening  Angle,  eQ 

For  these  calculations  the  closing  angle  was  fixed  at  er  =  0.2° 
while  dj(n)  and  d2(n)  were  calculated  using  the  two-mass  articulator 
synthesis  model.  Figure  3.14a  shows  the  effects  of  varying  eQ  from  0.2° 
to  2.7°.  As  eQ  increases,  the  rising  slope  of  the  EGG,  which 
corresponds  to  the  opening  phase  of  the  glottal  area  waveform, 
decreases.  A  bend  in  the  rising  portion  of  the  EGG  is  visible. 
For  eQ  >  3.5°  the  EGG  rises  gradually  as  the  model  vocal  folds  open 
until  the  angle  of  opening  reaches  the  specified  eQ  at  which  point  the 
EGG  jumps  suddenly,  in  step-like  fashion,  to  the  maximum  EGG  value. 
This  discontinuity  phenomenon  is  due  to  the  constant  C  being  specified 
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Figure  3.14  Simulated  EGG,  DEGG,  and  glottal  area  waveforms  for  various 
vocal  fold  angles  (a)  opening,  90  and  (b)  closing,  0C. 
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for  small  values  of  eQ  and  then  left  unchanged  for  all  succeeding 
calculations.  This  problem  might  be  resolved  by  letting  the  constant  C 
in  equation  (3.4)  be  a  function  of  the  opening  angle. 

Note  that  the  differentiated  EGG  (DEGG)  waveform  marks  the  instant 
of  glottal  opening  and  closing  with  its  positive  and  negative  peaks, 
respectively.  However,  as  the  angle  of  opening  increases,  the  positive 
DEGG  peak  becomes  broader,  making  the  decision  for  the  instant  of 
glottal  opening  more  difficult.  For  our  data,  a  good  EGG  model  waveform 
is  best  simulated  using  0.5°  <  9n  <  2°  . 

3.5.2  Varying  the  Closing  Angle,  9C 

For  these  calculations,  the  opening  angle  was  fixed  at  e„  =  2°  , 
while  dx(n)  and  d2(n)  were  calculated  as  before.  Figure  3.14b  shows  the 
effects  of  varying  ec  from  .0001°  to  2°.  For  9C  >  5°  a  stair-step 
discontinuity  occurs  in  the  falling  slope  of  the  EGG,  which  corresponds 
to  the  closing  phase  of  the  glottal  area.  As  ec  increases  the  falling 
slope  of  the  EGG  decreases. 

For  small  closing  angles,  the  largest  negative  peak  in  the  DEGG 
waveform  does  not  occur  at  the  instant  of  zero  glottal  area,  but  several 
instants  later.  However,  when  the  angle  of  closure  is  quite  large, 
namely  ec  =  2°  ,  then  the  large  negative  peak  in  the  DEGG  corresponds 
approximately  to  the  instant  of  zero  glottal  area.  This  appears  to 
agree  with  actual  measured  data.  The  instant  of  zero  glottal  area 
corresponds  to  the  instant  at  which  the  EGG  starts  its  downward 
(negative)  deflection.  This  instant  generally  occurs  in  the  model 
waveforms  just  slightly  before  the  instant  of  greatest  negative  slope, 
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i.e.,  the  instant  at  which  the  DEGG  has  its  largest  negative  value.  A 
good  EGG  model  waveform  is  best  simulated  using  0.0001°  <  er  <  0.5°  . 

3.5.3  Varying  the  Phase  Difference  Between  Upper  and  Lower  Vocal  Fold 
Margins 


For  these  calculations  ec  =  0.2°  and  e0  =  2°  and  the  minimum 
and  maximum  upper  and  lower  margin  displacements  are  equal,  but  the 
upper  vocal  fold  margin  lags  the  lower  margin.  The  simulated  EGG 
waveforms  in  Figure  3.15a  closely  resemble  the  measured  EGG  for  a  normal 
voice  phonation  at  low  frequency,  except  for  the  lack  of  rounding  of  the 
modeled  EGG  waveform  during  the  glottal  closed  phase.  When  the  phase 
difference  or  lag  between  the  vocal  fold  margins  is  zero,  the  resulting 
simulated  EGG  is  stylized  with  a  steep  closing  phase  and  a  gradual 
opening  phase.  » 

For  a  lag  from  0.3  to  0.7  ms  the  falling  slope  of  the  EGG  is 
affected  in  a  manner  analogous  to  that  caused  by  increasing  ec.  For  a 
lag  from  0.7  to  1.0  ms,  discontinuities  (steps)  occur  in  both  the  rising 
and  falling  slopes  of  the  EGG,  as  discussed  earlier.  With  a  lag  from 
1.3  to  4.2  ms,  the  simulated  EGG  resembles  that  produced  by  an 
individual  phonating  in  vocal  fry  (see  Figures  3.15b  and  c). 

For  large  lags  note  that  the  DEGG  waveform  does  not  correctly  flag 
the  instants  of  glottal  opening  and  closing  as  has  been  commonly 
assumed.  This  can  also  happen  when  vocal  fold  polyps  and  nodules  are 
present.  Consequently,  we  need  another  criterion  for  marking  the 
instants  of  glottal  opening  and  closing  when  "irregular"  or  "unusual" 
vocal  fold  vibrations  are  taking  place. 
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Figure  3.15  Simulated  EGG,  DEGG,  and  glottal  area  waveforms  for 
various  lag  (phase)  differences  between  upper  and  lower  vocal  fold 
margins  (a).  Upper  margin  lags  lower  margin.  Simulated  vocal  fry 
(b)  and  measured  vocal  fry  (c).  Compared  measured  EGG  with 
simulated  2.0  msecs  lag  EGG  in  part  (b). 
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3.5.4  Effects  of  Mucus 

Mucus  strands  that  bridge  the  vocal  folds  during  the  opening  phase 
have  subtle  effects  on  the  EGG  waveform.  During  the  opening  phase,  the 
EGG  rises  relatively  slowly  as  the  mucus  strand  stretches  but  continues 
to  bridge  the  vocal  folds.  At  some  point  in  the  opening  phase  the  mucus 
strand  breaks.  This  event  can  be  measured  from  our  ultra-high-speed 
laryngeal  films.  Just  after  this  break,  there  is  a  rapid  rise  (almost 
an  upward  step)  in  the  EGG  waveform.  This  phenomenon  is  seen  even  more 
readily  in  graphs  of  glottal  lengths. 

An  example  of  a  measured  EGG  waveform  with  a  large  amount  of  mucus 
present  on  the  vocal  folds  and  with  a  mucus  strand  bridging  the  vocal 
folds  during  the  opening  phase  is  shown  in  Figure  3.16a.   The  mucus 
phenomenon  described  above  is  more  readily  observable  but  still  subtle 
in  the  graphs  of  the  glottal  length  and  glottal  area.  These  graphs  have 
a  "bend"  in  the  rising  portion  of  the  waveform.  The  folds  are  open  at 
the  beginning  of  the  bend,  but  a  strand  of  mucus  bridges  the  folds. 
When  this  strand  breaks,  the  length  waveform  rises  sharply  to  its 
characteristic  flat  top.  While  the  EGG  waveform  appears  normal  for  this 
case,  the  differentiated  EGG  does  not  have  its  maximum  at  the  instant  of 
glottal  opening,  rather,  the  maximum  occurs  just  as  the  mucus  strand 
breaks.   This  point  corresponds  to  the  "knee"  in  the  opening  phase  of 
the  glottal  area  waveform.  The  EGG  waveform  reaches  its  maximum  just  as 
the  mucus  strand  has  broken.   Thus  an  excessive  amount  of  mucus, 
providing  a  highly  conductive  current  path,  may  distort  the  EGG 
measurement.    In  such  cases  the  EGG  may  not  be  an  accurate 
representation  of  the  lateral  area  of  contact  of  vocal  fold  tissue. 
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Figure  3.16  (a)  measured  EGG,  mucus  strand  is  present  on  vocal  folds. 
Also  shown  are  the  DEGG,  the  glottal  length,  (L),  and  glottal  area  (A), 
(b)  simulated  EGG  for  a  simulated  mucus  strand.  The  various  EGG  curves 
represent  a  fractional  decrease  in  the  vocal  fold  lateral  area. 
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Figure  3.16b  shows  the  effect  of  simulating  various  size  mucus  strands 
bridging  the  vocal  folds  during  the  initial  part  of  the  vocal  fold 
opening  phase  and  then  breaking.  This  simulation  mimics  the  measured 
event  quite  well . 

Another  mucus-related  phenomenon  is  reflected  in  the  EGG  waveform 
as  a  difference  in  the  impedance  of  the  vocal  folds  during  the  opening 
and  closing  phases  of  the  folds.  If  only  a  small  amount  of  mucus  is 
present  on  the  lateral  area  of  contact  of  the  folds  and  no  tissue 
compression  occurs  when  the  vocal  folds  collide,  then  the  impedance  of 
the  vocal  folds  at  the  instants  of  initial  glottal  opening  and  initial 
glottal  closing  should  be  the  same.  However,  when  a  large  amount  of 
mucus  is  present  the  impedance  of  the  folds  is  decreased  (since  the 
resistance  of  mucus  is  presumed  to  be  less  than  that  of  the  vocal  fold 
tissue).  The  EGG  waveform  will  now  be  slightly  lower  at  the  instant  of 
initial  glottal  opening,  because  of  the  mucus,  than  at  the  instant  of 
initial  glottal  closing.  This  can  be  seen  in  Figure  3.17. 

3.5.5  Effects  of  Nodules  and  Polyps 

According  to  Moore  [53]  polyps  and  nodules  on  the  vocal  folds  arise 
as  a  result  of  trauma  to  the  folds.  A  laryngeal  polyp  may  occur  as  a 
result  of  a  single  brief  period  of  vocal  strain,  whereas  a  nodule  often 
develops  over  a  longer  period  of  time  and  progresses  through  several 
stages. 

The  shape  and  consistency  of  the  polyp  or  nodule  affects  the  shape 
of  the  EGG  waveform.  We  believe  that  an  edematous  polyp  or  nodule  will 
have  a  more  pronounced  effect  on  the  EGG  than  a  fibrous  polyp  or 
nodule.  This  remains,  however,  a  conjecture  to  be  verified. 
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Fngure  3.17  Measured  EGG,  glottal  length  (L),  and  glottal  area  (a) 
tor  the  case  when  a  large  amount  of  mucus  is  present  in  the  vocal 
folds.  Vertical  lines  demarking  the  initiation  of  glottal  openinq 
and  closing  identify  different  impedence  levels  on  the  EGG 
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A  large  polyp  will  cause  an  earlier  contact  of  the  vocal  folds 
during  the  closing  phase  but  not  a  greater  area  of  contact  unless  it  is 
very  soft.  A  large  protrusion  may  actually  reduce  the  amount  of  contact 
since  there  will  be  no  contact  adjacent  to  the  nodule.  However,  since 
nodules  are  in  the  loose  mucosa  they  slide  onto  and  off  the  upper 
surface  and  consequently  may  present  a  different  rate  and  amount  of 
contact  between  opening  and  closing.  Furthermore,  the  EGG  for  hard 
protrusions  should  differ  from  that  for  soft  protrusions,  i.e.,  show 
less  contact  area. 

The  effect  of  a  large  edematous  polyp  on  the  EGG  is  easily 
implemented  in  our  simulation  model  as  a  percentage  increase  in  the 
lateral  area  of  contact  as  the  folds  close.  We  can  vary  the  location, 
size,  and  effective  conductivity  of  the  polyp  or  nodule. 

Figure  3.18  shows  an  EGG  measured  from  a  patient  with  a  nodule 
located  near  the  middle  of  the  upper  margin  of  the  vocal  folds.  As  the 
folds  initiate  the  opening  phase,  the  EGG  progresses  in  a  normal 
fashion.  But  as  the  folds  continue  to  separate  beyond  the  location  of 
the  nodule,  the  EGG  level  remains  constant.  This  is  where  the  nodule  on 
one  fold  remains  in  contact  with  the  other  fold  for  a  brief  period.  The 
effective  lateral  area  of  contact  remains  constant.  For  a  short 
interval  the  contact  area  of  the  nodule  compensates  for  the  loss  of 
contact  area  due  to  the  folds  separating  below  and  beyond  the  nodule 
location.  In  the  same  figure  we  show  the  EGG  waveform  for  a  simulated 
nodule. 

Figure  3.19  illustrates  the  effect  of  varying  the  lateral  area  of 
contact  of  the  nodule  on  the  opening  (rising)  portion  of  the  EGG 


63 


IT) 


o 

4-> 

CM 

C 

CD    10 

(/) 

CD    J- 

i-    O 

Q.  it- 

o 

CD — « 

cu 

i—  SZ 

E 

3    Q. 
•O   n3 

O    S- 

C    D1 

"O   s- 

CD    CD 

<a  o 


"955 
CBIVTWIS 


•r- 

CO 

CO  CI 

LU 

03 

T> 

s- 

CD 

O 

s~ 

4- 

=3 

to 

^^ 

fa 

.C 

CD 

Q. 

r0 

S- 

CT) 

LT) 

S- 

• 

• 

<DOT3 

Q. 

ii 

• — 

O. 

oo 

3a 

<4- 

, 

CD 

*1 

ro 

CD  O 

<_> 

Ul 

• 

O 

CNJ 

> 

X) 

II 

ai 

oo 

+-> 

CD 

c 

rO 

o 

r— 

n 

3  *a 

c 

E 

P— 

o 

■r- 

o 

CO  <4- 

+-> 

c 

f— 

CD 

00 

f0 

to 

1-H 

O 

CD 

• 

O 

s- 

CO 

> 

Q. 

CD 

CD 

a) 

s- 

c 

3 

o 

3 

CD 

TJ 

'i — 

c 

O 

u. 

o 

c 

CBynsv3w 


64 


00 

CD 

tn 

l/>  O 

«5 

i 

(D  C 

u 

o 

ai 

c 

-c 

•r— 

t— 

l — 

. 

n3 

• 

fO 

c 

00 

CD 

o 

ai 

i- 

•r— 

IM 

fO 

+-> 

•i — 

O 

00 

4-> 

10 

CJ 

i~ 

CD 

n3 

<+- 

r— 

+-> 

3 

C 

oo 

■o 

o 

^ 

o 

CJ 

o 

c 

•r— 

r— 

s- 

ia 

IQ 

00 

S- 

> 

ZS 

CD 

o 

-M 

■s. 

,r~ 

ro 

o 

i- 

If. 

ra 

> 

CD 

VI 

-C 

E 

o>+-> 

i- 

c 

O 

•r— 

c 

M- 

+-> 

•fM 

CD 

<a 

> 

i— 

CD 

to 

^ 

oo 

2 

E 

n3 

•r— 

CD 

CD 

(/) 

S- 

O 

o 

UJ 

ft 

c 

(0 

■r— 

■a 

ai 

CD 

t-  &s 

•*-> 

(O  LT> 

<T3 

r— 

4-> 

(T3 

=3 

o 

E 

n3 

00 

■T" 

+-> 

+-> 

00 

c: 

C 

o 

CD 

o 

oo 

<n 

CD 

r-H  I 

S- 

• 

fO 

Q. 

CO 

s. 

CD 

CD 

S- 

0)  +J 

s. 

n3 

CD 

=S   r 

> 

Ol 

i- 

■■■  '"■■ ' II  li^.  •'  III 


993 


Li_  •!—    o 


993a 


V3HV 


65 


waveform.  The  larger  the  contact  area  of  the  nodule  the  longer  the  flat 
segment  of  the  rising  portion  of  the  EGG  waveform.  The  location 
(anterior/posterior)  of  the  nodule  can  be  varied  as  well  in  the  model, 
as  seen  in  Figure  3.20.  These  examples  illustrate  the  potential  of  the 
EGG  to  estimate  the  size  and  location  (anterior/posterior)  of  a  vocal 
fold  nodule  or  polyp. 

3.6  Discussion 

Our  simple  model  of  vocal  fold  vibrations  and  EGG  waveform 
generation  does  not  satisfy  all  the  properties  of  physics,  e.g., 
conservation  of  momentum.  The  collision  of  actual  folds  results  in 
their  deformation.  This  may  lead  to  an  increase  in  the  lateral  contact 
area  because  the  mucosal  layer  is  pushed  both  superiorly  and  inferiorly 
along  the  surface  of  contact.  This  deformation  of  mass  is  not  accounted 
for  in  our  model.  These  imperfections  can  and  should  be  overcome  with 
future  improvements  to  the  model.  Despite  this  incompleteness  in  the 
model,  we  found  the  model  could  replicate  many  aspects  of  the  EGG 
waveform  via  the  concept  of  inverse  lateral  vocal  fold  contact  area.  We 
believe  the  numerous  examples  illustrated  in  the  previous  sections 
substantiate  this  claim.  The  concept  of  vocal  fold  mass  compression 
seems  to  be  of  little  importance  in  explaining  the  major  features  of  the 
EGG  waveform. 

We  stress  that  our  model  is  presently  best  suited  for  a  normal  male 
voice  in  modal  register.  This  is  because  the  vocal  folds  are  apparently 
thicker  for  males  and  our  observations  concerning  opening  and  closing 
angles  and  upper  and  lower  margin  lags  have  been  made  primarily  from 
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ultra-high-speed  laryngeal  films  for  this  case.  Higher  pitched  voices 
(both  males  and  females)  tend  to  have  thinner,  longer  vocal  folds.  The 
initial  point  of  vocal  fold  contact  in  this  case  may  occur  at  the  middle 
of  the  vocal  folds.  Complete  closure  may  not  occur.  The  folds  also 
appear  to  be  quite  thin,  with  perhaps  only  one  margin,  i.e.,  no  upper 
and  lower  vocal  fold  margins.  These  concepts  can  be  modeled,  but  we 
have  yet  to  do  so. 

Several  interesting  observations  can  be  drawn  from  our  modeled 
data.  The  differentiated  EGG  (DEGG)  has  been  considered  by  users  of  the 
EGG  to  be  a  reliable  indicator  of  the  instants  of  glottal  opening  (large 
positive  peak)  and  closing  (large  negative  peak).  This  concept  appears 
to  be  valid  for  a  normal  male  voice  in  modal  register  with  vocal  fold 
closure.  However,  the  simulation  of  vocal  fold  vibratory  motion  that 
either  departs  from  modal  register  or  is  impaired  by  a  vocal  fold  nodule 
predicts  that  the  DEGG  is  no  longer  a  reliable  indicator  of  the  instants 
of  glottal  opening  and  closing.  The  model  predicts  that  closure  occurs 
later  than  the  measured  data  and  the  peak  used  to  predict  opening  is 
very  broad. 

By  using  a  set  of  model  EGG  templates  for  a  series  of  fixed  opening 
and  closing  vocal  fold  angles,  one  may  be  able  to  estimate  these  angles 
from  a  measured  EGG  waveform.  Similar  remarks  apply  to  estimating  the 
location  and  size  of  a  vocal  fold  nodule.  But  more  work  is  needed 
before  we  can  make  such  predictions  reliably  and  consistently. 

A  factor  which  plagues  the  users  of  the  EGG  is  determing  whether  or 
not  the  glottis  is  closed;  i.e.,  can  a  closed  glottis  be  determined  from 
the  EGG  waveform  alone?  Presently,  the  answer  to  this  question  is  no! 
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The  EGG  waveform  of  a  breathy  voice  with  an  open  glottal  chink  may  look 
essentially  the  same  as  an  EGG  waveform  where  complete  glottal  closure 
has  occurred.  If  the  determination  of  complete  glottal  closure  is 
important,  then  the  investigator  must  use  other  means  (such  as  listening 
to  the  voice)  to  assess  whether  complete  glottal  closure  has  occurred. 

The  model  is  presently  inadequate  with  respect  to  duplicating  the 
rounded  segments  of  measured  EGG  waveforms.  These  rounded  segments  are 
most  prominent  in  the  closed  and  open  phases.  During  the  closed  phase, 
we  feel  the  vocal  fold  tissue  has  compressed  and  deformed  to  actually 
increase  the  lateral  area  of  vocal  fold  contact.  Our  model  does  not 
duplicate  this  phenomenon  but  one  can  easily  conceptualize  this  effect 
and  predict  that  such  behavior  in  the  model  would  lead  to  a  more 
realistic  simulation  of  a  measured  EGG  waveform.  During  the  opening 
phase,  a  different  phenomenon  is  occuring,  and  we  do  not  have  as  ready 
an  explanation.  Apparently  the  tissue  conductance  path  changes 
gradually  as  the  folds  open.  This  path  should  not  be  modeled  by  a  fixed 
impedance  or  fixed  area  (the  parameter  c  in  our  model).  More  work  is 
needed  to  understand  this  phenomenon  better. 

The  absolute  magnitude  of  the  EGG  waveform  in  our  model  has  little 
importance,  since  the  waveform  amplitude  can  be  scaled  using  the 
constant  k  in  our  model.  This  is  in  agreement  with  previous 
observations  Childers  et  al.  [60]  and  Baer  et  al.  [63].  But  the 
relative  amplitude  does  give  information  regarding  the  relative 
separation  of  the  folds. 

The  model  of  the  vocal  vibratory  motion  is  only  a  very  rough 
approximation  to  that  observed  for  real  vocal  folds.  Some  investigators 
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would  depict  the  vibratory  motion  of  one  fold  as  more  analogous  to  a 
vibrating  string  or  large  rubber  band,  anchored  at  the  anterior  and 
posterior  ends.  This  description  seems  more  adequate  for  higher  pitched 
registers.  The  model  we  have  implemented  is  more  appropriate  for  modal 
register.  Despite  its  obvious  faults,  the  model  has  replicated  many 
aspects  of  the  EGG  waveform. 

The  artistic,  unitary,  one  mass  vocal  fold  model  depicted  in  Figure 
3.10  can  be  improved  by  extending  the  model  to  include  two  masses  as 
shown  in  Figure  3.21.  This  improved  model  more  accurately  depicts  the 
vocal  fold  vibratory  motion  by  allowing  the  glottis  to  close  with  only 
partial  (approximately  one-half)  vocal  fold  contact.  A  comparison  of 
Figures  3.10  and  3.17  allows  one  to  conclude  that  the  vocal  fold 
vibratory  events  are  more  precisely  matched  with  EGG  waveform  segments 
in  Figure  3.21  than  in  Figure  3.10.  This  addition  to  the  model, 
including  those  described  above,  should  markedly  improve  our  ability  to 
relate  vocal  fold  vibratory  motion  events  to  EGG  waveform  segments. 

3.7  Conclusions 


Our  triangular  unitary  mass  model  of  the  vocal  folds  has 
incorporated  several  concepts  of  vocal  fold  vibrations,  three  of  which 
are  opening,  angles  of  the  vocal  folds,  closing  angles  of  the  vocal 
folds,  and  the  phase  shift  between  the  upper  and  lower  margins  of  the 
folds.  We  have  found  that  EGG  waveforms  for  a  normal  male  voice  in 
modal  register  are  simulated  when 
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Figure  3.21  EGG  waveform  and  the  flexible,  elastic  two-mass  vocal 
fold  model.  The  vocal  fold  model  events  are  labeled  on  the  EGG 
waveform.  The  upper  and  lower  vocal  fold  margins  on  each  mass  are 
out  of  phase.  An  artistic  license  was  taken  to  illustrate  vocal 
fold  motion. 


71 
0.5°  <  eQ  <  2° 

0.0001°  <  9C  <  0.5°  . 

0.3  <  lag  <  0.7  (msec) 


We  have  focused  our  work  on  the  normal  male  voice  in  modal  register 
because  we  know  more  about  the  vibratory  motion  of  the  vocal  folds  for 
this  case.  We  have  simulated  a  few  examples  of  vocal  fry  as  well  as  the 
effects  of  mucus  and  nodules. 

In  the  simulations  of  vocal  nodules  (and  polyps)  we  can  predict  the 
approximate  size  and  location  of  the  nodule  (being  anterior  or  posterior 
on  the  vocal  folds)  by  observing  the  EGG  waveform  segment  on  which  a 
departure  from  "normal"  functioning  occurs.  This  is  a  form  of  the 
inverse  problem,  i.e.,  given  an  EGG  waveform  we  can  begin  to  predict 
vocal  fold  configurations.  More  work  on  this  "inverse"  problem  is 
needed. 

One  observation  about  vocal  fry  is  in  order.  Vocal  fry  is  defined 
perceptually  to  be  a  low  pitched,  rough-sounding  phonation  that 
corresponds  to  aperiodic  low  frequency  vocal  fold  vibration.  The  model 
can  simulate  a  fry-like  EGG  waveform  at  any  pitch,  however,  because  the 
lag  between  the  upper  and  lower  margins  may  be  specified  independent  of 
the  fundamental  frequency.  In  the  near  future  one  could  use  the 
articulatory  speech  model  to  synthesize  aperiodic  speech  at  various 
pitch  frequencies  and  record  the  listener's  report  of  the  auditory 
perception  of  such  simulations.  This  should  allow  us  to  compare  and 
contrast  the  vibratory  basis  of  vocal  fry  and  other  high-pitched  vocal 
roughness.   One  also  could  simulate  EGG  waveforms  for  high  pitched 
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voices,  e.g.,  female  voices,  and  for  other  voice  registers  and  voice 
qualities. 


CHAPTER  4 
DATA  COLLECTION  AND  MEASUREMENT  SYSTEM 

The  present  study  relies  on  obtaining  minute  details  of  the 
mechanics  of  the  vibrations  of  the  vocal  folds.   The  data  collection 
system  used  for  this  purpose  is  shown  in  Figure  4.1.  The  major  building 
blocks  of  this  system  are:  an  ultra-high-speed  cinematography  camera;  a 
high   intensity   lamp   for   illumination  of  the  vocal   folds;   an 
electroglotto-graph  for  obtaining  the  EGG;  a  directional  hearing 
microphone  for  tranducing  the  acoustic  pressure  wave  into  the  electrical 
speech  signal;  timing  code  generator  for  synchronizing  the  recorded  and 
filmed  data;  tape  recorders  for  recording  speech,  EGG,  and  timing 
signals;  and  a  grid  projector  for  absolute  measurement  of  the  filmed 
vocal  folds  image. 

The  system  is  designed  to  photograph  the  vibration  of  the  folds  of 
a  subject  in  excess  of  5000  frames/sec.  The  EGG  and  speech  signals  are 
recorded  for  the  phonated  vowel,  in  this  study,  the  vowel  is  /i/.  The 
speech  signal  and  the  EGG  are  recorded  in  synchrony  with  the  film  by  a 
timing  signal  and  a  timing  code.   This  synchrony  is  obtained  by 
Photographing  the  timing  signal  of  5  Khz,  the  timing  code  (to  be 
described),  and  the  EGG  signal  on  the  edges  of  the  laryngeal  film  while 
simultaneously  recording  the  speech,  EGG,  and  another  10  Khz  timing 
signal  on  tape  recorders.  The  10  Khz  timing  signal  is  in  phase  with  the 
5  Khz  timing  signal  photographed  on  the  film. 
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Figure  4.1  Data  collection  system, 
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In  the  following  sections  we  describe  the  data  base  collected  from 
normal  and  pathologic  subjects.  We  also  describe  the  collection  system 
functional  blocks  in  conjunction  with  the  overall  system  operation. 

4.1  Data  Base 

Two  classes  of  subjects  were  used  in  this  study,  subjects  with  a 
normal  larynx  and  subjects  with  a  pathology  of  the  vocal  folds. 

4.1.1  Normal  Subject  Data  Base 

The  normal  population  in  this  study  consisted  of  four  adult  males 
(HMN,  DMK,  GPM,  AKK).    These  subjects  were  checked  by  a  speech 
pathologist  for  evidence  of  vocal  disorders  or  laryngeal  pathology  but 
none  were  found.  As  mentioned  earlier,  the  vowel  /i/  was  chosen,  since 
during  phonation  of  this  vowel  the  epiglottis  is  usually  held  back  out 
of  the  optical  pathway  of  the  vocal  folds  image  during  filming. 
However,  due  to  the  presence  of  a  laryngeal  mirror,  and  holding  down  the 
tongue  to  facilitate  high  speed  photography,  the  sound  produced  was 
closer  to  an  /a/  than  /i/  in  most  cases.   The  duration  of  filming  and 
recording  was  approximately  three  seconds. 

The  task  for  each  subject  consisted  of  phonating  the  vowel  /i/  at 
three  different  intensities  at  each  of  three  different  fundamental 
frequencies.  The  target  fundamental  frequencies  were  125  Hz,  170  Hz, 
and  340  Hz.  Each  subject  tried  to  produce  these  fundamental  frequencies 
by  matching  a  pure  tone  of  the  corresponding  frequency  provided  on  a  set 
of  headphones.  The  subjects  produced  intensities  at  each  of  the  target 
fundamental  frequency  at  a  "comfortable"  level,  an  intensity 
approximately  4dB  above  and  another  intensity  about  4  dB  below  this 
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level.  The  output  intensities  were  measured  using  a  General  Radio  Co. 
type  1551-C  sound  level  meter.  Thus,  there  were  nine  tasks  for  each 
normal  subject  for  a  total  of  thirty-six  tasks. 

4.1.2  Patient  Data  Base 

Photographing  subjects,  in  general,  is  a  difficult  task.  The 
presence  of  a  pathology  introduces  additional  constraints  on  high  speed 
filming  of  these  subjects.  Many  patients  are  unable  to  phonate  at 
different  fundamental  frequency  ranges  when  a  laryngeal  mirror  is  placed 
inside  their  mouth.  In  many  cases,  the  pathology  itself  restricts  the 
frequency  range  of  the  patient's  vocal  folds. 

In  our  study  we  were  able  to  recruit  four  patients  for  high  speed 
photography.  The  patient  population  consisted  of  one  adult  female  (DJB) 
and  three  adult  males  (LMB,  GTS,  MXR).  Using  traditional  methods,  each 
patient's  condition  was  diagnosed  by  an  otolaryngologist  and  a  speech 
pathologist  at  the  ENT  clinic  at  the  University  of  Florida.  Table  4.1 
lists  the  tasks  the  patients  performed,  and  their  corresponding 
diagnosis. 

The  task  performed  by  each  patient  was  tailored  to  his  or  her 
individual  capabilities.  Care  was  taken  so  as  not  to  subject  the 
patients  to  undue  pain  or  discomfort.  Each  patient  was  asked  to  phonate 
the  vowel  /i/  as  best  as  they  could.  Again,  the  produced  phonation 
sounded  more  like  an  /a/  in  most  cases.  It  is  clear  that  we  did  not 
have  as  much  control  of  the  tasks  performed  by  patients  compared  to  the 
tasks  performed  by  normal  subjects. 
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TABLE  4.1 
SUBJECTS  WITH  PATHOLOGY 


Subject  Sex    Task  1       Task  2  Comments 


DJB    F   Modal  pitch  High  pitch  Unilateral  nodule 
IM                   IM 

LMB    M   High  pitch  High  pitch  Unilateral  polyp  with  companion 

intermittent  sequence  /i/    bulge  on  the  other  fold  re- 
sequence /i/  suits  in  some  loss  of  voice 


GTS    M   Exhalation    Exhalation   Bilateral  paralysis 
IM  IM 

MXR    M   Modal  /i/    None       Voice  production  problem, 

loss  of  voice 
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4.2  High  Speed  Photography 

The  high  speed  camera  used  in  this  study  is  a  Fastax  Model  WF14. 
The  maximum  exposure  rate  of  this  camera  is  8000  frames/sec.  The  rate 
of  exposure  is  primarily  dependent  on  the  voltage  applied  to  the  drive 
motor  of  the  camera.  In  this  study,  this  voltage  was  adjusted  such  that 
an  exposure  rate  of  5000  frames/sec  was  achieved  before  the  last  150-200 
frames  of  the  film. 

The  camera  has  two  lens  systems  to  facilitate  the  simultaneous 
filming  of  two  images  on  the  film  frames.  This  feature  was  fully 
utilized  in  our  study  as  will  be  discussed  shortly. 

The  technique  of  ultra-high-speed  photography  of  the  vibrating 
vocal  folds  is  described  in  [72,73].  Photographing  the  vocal  folds  is 
accomplished  by  inserting  a  laryngeal  mirror  inside  the  subject's  mouth 
and  placing  it  at  the  back  of  the  pharynx.  Unfortunately,  many  subjects 
cannot  overcome  the  gag-reflex  produced  by  this  setup,  and  only  a 
selected  group  of  subjects  can  be  used  for  laryngeal  photography.  The 
location  of  the  folds  inside  the  larynx  allows  a  ^ery  small  amount  of 
light  to  be  present  at  any  time.  This  problem  is  solved  using  a  high 
intensity  incandescent  lamp  having  a  color  temperature  of  3200°K  to 
illuminate  the  vocal  folds  during  vibration.  The  light  beam  passes 
through  two  condenser  lenses  as  well  as  a  water  cell  to  remove  the 
infrared  and  ultraviolet  portion  of  the  light  spectrum.  This  serves  to 
protect  the  subject's  vocal  folds  and  other  tissues  from  excessive  heat 
or  ultraviolet  light. 

The  laryngeal  mirror  reflects  the  light  at  90°  downward  onto  the 
vocal  folds.  The  image  of  the  folds  is  reflected  back  by  the  laryngeal 
mirror  through  one  of  the  two  lens  systems  to  the  high  speed  camera  (see 
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Figure  4.1).  The  folds'  image  is  focused  manually  by  adjusting  the 
camera  lens  system.  As  the  subject  phonates,  the  details  of  the  folds' 
vibration  are  captured  on  the  film. 

Using  the  same  lens  system,  absolute  measurements  of  the  folds' 
vibration  (glottal  area)  is  accomplished  by  positioning  a  0.1  square 
inch  grid  in  the  focal  plane  of  the  vocal  folds'  image,  as  shown  in 
Figure  4.1.  The  grid  image  is  positioned  to  lie  in  the  corner  of  the 
film  frames,  and  does  not  interfere  with  the  folds'  image. 

The  second  lens  system  is  specifically  designed  to  photograph  an 
oscilloscope  face.  The  EGG  signal  and  two  other  timing  signals  (to  be 
discussed  shortly)  are  photographed  through  this  lens  system.  The 
oscilloscope  trace  of  the  EGG  signal  is  positioned  to  lie  along  one  edge 
of  the  film.  The  other  traces  of  the  timing  signals  lie  along  the  other 
edge  of  the  film.  Again,  care  is  taken  such  that  none  of  the  signal 
traces  interfere  with  the  vocal  fold  image.  Since  the  two  lens  systems 
of  the  camera  are  at  a  90°  angle  with  each  other,  the  three  oscilloscope 
traces  appear  on  a  film  frame  that  is  displaced  five  frames  behind  the 
film  frame  recording  the  corresponding  vocal  folds  image. 

Two  different  types  of  high  speed  films  were  used.  These  were 
black  and  white  Kodak  7277  4-X  reversal  film  and  the  color  Kodak 
Ektachrome  7250  high  speed  video  news  film. 

Accurate  time  synchronization  between  the  speech,  EGG,  and  the 
laryngeal  film  frames  is  unique  and  essential  to  our  study.  In  order  to 
achieve  this  synchronization,  a  special  time  code  generator  was  designed 
[36].  The  time  code  generator  provides  three  timing  signals,  a  10  Khz 
square  wave  that  is  recorded  on  the  second  channel  of  both  tape 
recorders,  a  5  Khz  square  wave  derived  from  and  in  phase  with  the  10  Khz 
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signal,  and  an  8  bit  counter  signal.  The  latter  two  timing  signals  are 
inscribed  on  the  laryngeal  film  frames  as  described  earlier. 

The  high  speed  camera  is  an  electromechanical  device.  The  camera 
motor  requires  a  short  amount  of  time  to  achieve  the  desired  speed. 
Hence,  the  film  exposure  rate  varies  from  zero  frames/sec  at  the 
beginning  of  filming  to  5000  frames/sec  at  the  latter  part  of  the  100 
feet  high  speed  film.  The  5  Khz  square  wave  signal  is  photographed  on 
the  edge  of  the  film.  This  signal  is  used  to  monitor  the  film  exposure 
rate  (speed),  among  other  things.  The  constant  5000  frames/sec  rate  is 
achieved  when  each  cycle  of  this  signal  is  aligned  with  one  frame  of  the 
film.  This  method  was  used  to  accurately  mark  the  constant  5000 
frames/sec  exposure  rate  region.  We  usually  obtain  from  150-200  frames 
of  film  in  this  region. 

The  8  bit  counter  signal  is  used  to  locate  a  specific  frame  in  the 
100  feet  film.   The  counter  is  incremented  by  one  every  100  cycles  of 
the  5  Khz  signal.  The  counter  value  is  output  once  every  100  cycles  of 
the  5  Khz  signal.  The  counter  bits  are  shifted  out  serially  at  a  clock 
rate  of  40  Khz,  such  that  the  8  bits  are  time  aligned  with  every  100th 
cycle  of  the  5  Khz  signal.  The  oscilloscope  trace  of  the  8  bit  counter 
is  located  directly  below  the  5  Khz  signal  trace  on  the  edge  of  the 
film.   The  proximity  of  the  two  traces  helps  in  resolving  ambiguities 
that  frequently  arise  during  measurements  of  the  film  frames.   Except 
during  the  100th  5  Khz  cycle,  the  counter  output  signal  on  the  film  is 
set  to  zero. 

Thus,  to  locate  any  frame  on  the  film,  say  N,  we  can  use  the 
following  formula: 
N  =  k  +  j 
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where; 

k  =  N  mod  100;  the  number  of  times  the  counter  bits  are  shifted  out 
and 

j  =  remainder  of  N  mod  100 

So,  one  will  locate  the  kth  output  of  the  8  bit  counter  and  count  j 
number  of  frames  to  locate  the  desired  frame.  Usually,  the  reverse 
operation  is  carried  out.  The  frames  are  counted  backward  from  the 
starting  frame  of  the  constant  region  of  5000  frames/sec  to  the  last  8 
bit  counter  code  on  the  film.  Let  the  number  of  frames  counted  be  j 
frames,  and  let  the  code  value  be  k,  then  the  frame  is  the  [100(k+l)  + 
j]th  frame.  We  use  (k  +  1)  instead  of  k  because  the  initial  8  bit  code 
shifted  out  after  the  1st  100  5  Khz  cycles  is  zero  and  not  one. 

The  10  Khz  timing  signal  recorded  on  both  tape  recorders  is  used  to 
provide  the  external  sampling  clock  for  the  A/D  converter  when 
digitizing  the  speech  and  EGG  signals.  The  5  Khz  signal  photographed  on 
the  film  is  derived  from  and  in  phase  with  this  10  Khz  signal.  So  the 
number  of  samples  of  the  EGG  signal  corresponding  to  the  [100(k+l)  +  j] 
cycles  of  the  5  Khz  signal  is  [2  (100(k+l)  +  j}].  However,  the 
corresponding  number  of  samples  for  the  speech  signal  is  [2  {100(k+l)  + 
j}  +  d],  where  d  is  the  number  of  samples  accounting  for  the  propagation 
delay  of  the  speech  signal  from  the  glottis  to  the  microphone. 

Recently,  we  modified  the  time  code  generator  circuit  to  interface 
with  the  newly  acquired  Vetter  FM  tape  recorder.  The  modification 
includes  a  highly  stable,  crystal-based  oscillator  to  eliminate  the 
drift  in  frequency  exhibited  by  the  old  RC  oscillator.  Also,  the  10  Khz 
timing  signal  was  replaced  by  a  20  Khz  square  wave  signal.  Now  the 
speech  and  EGG  channels  can  be  sampled  using  the  same  sampling 
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frequency.  The  other  timing  signals,  namely  the  5  Khz  and  the  8  bit 
counter  signals,  were  not  modified.  The  overall  operation  of  the  time 
code  generator  is  essentially  the  same  as  discussed  earlier. 

The  film  frames  analyzed  usually  lie  at  the  end  of  the  spool  of 
film.  Thus,  the  timing  signals,  used  to  align  the  recorded  speech  and 
EGG  signals  with  the  traced  EGG  and  area  function  measured  off  the  film, 
need  not  start  at  the  beginning  of  the  film.  So  a  delay  circuit  was 
added  to  the  time  code  generator  such  that  the  timing  signals  start 
shortly  after  the  film  has  already  started.  This  feature  greatly 
reduced  the  transient  artifact  that  occurred  in  the  timing  signals  when 
the  camera  motor  was  switched  on. 

4.3  Speech  and  EGG  Signals  Recording 

The  speech  signal  was  recorded  using  a  directional  hearing  aid 
microphone  coupled  directly  to  one  channel  of  a  stereo  tape  recorder  or 
an  FM  tape  recorder.  The  microphone  was  attached  to  the  laryngeal 
mirror  handle  and  positioned  inside  the  mouth  to  eliminate  the  noise 
pickup  of  the  camera  motor.  The  distance  of  the  glottis  from  the 
microphone  varies  from  subject  to  subject,  but  was  approximately  11  cm 
in  most  cases.  The  audio  bandwidth  of  the  microphone  has  been  measured 
to  be  about  6  Khz  with  a  slight  peak  at  4  Khz.  The  tape  recorders  used 
were  Revox  A77,  Teac  A-2060  or  a  Vetter  model  B  FM  instrumentation 
recorder. 

The  EGG  signal  was  obtained  using  an  electroglottograph  designed  by 

D.  Teany  and  manufactured  by  Synchrovoice  Associates.   This  device  was 

modified  to  give  a  more  stable  output.   Figure  4.2  illustrates  the 

linear  phase  filter  circuit  used  for  this  purpose.  The  EGG  signal  was 
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ECC 


R1-188K  R2-278K  R3-279K  RH-228K  RS«=S8K  R6-276K  R7-12K  R0-276K  R9-100K  R10-1S0K  RI1-I.2K 
R12-1.2K  R13-1.2K  RM-8.2K  RIS-I.2K  R16-22K 
CI=C2-C5-.1UF  C3-.B15UF  C4-.833UF 


Figure  4.2  Trend  removal  filter  for  the  electroglottograph. 
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connected  to  one  channel  of  a  Sony  model  TC530  stereo  tape  recorder  or  a 
Vetter  model  B  FM  instrumentation  recorder.  The  rise  and  fall  of  the 
electroglottograph  was  adjusted  using  a  square  wave  calibration  circuit 
[11]. 

Synchronization  of  the  high  speed  film,  EGG,  and  the  speech  signal 
requires  the  presence  of  a  synchronizing  timing  signal.  Originally,  the 
four  channel  Vetter  FM  recorder  was  not  available.  The  other  available 
stereo  recorders,  the  Revox  A77,  Teac  A-2060,  and  the  Sony  TX530  were 
used.  Since  each  one  of  these  recorders  has  only  two  input  channels,  a 
10  Khz  square  wave  signal  was  input  simultaneously  to  one  channel  on 
each  recorder,  while  the  other  channel  was  used  for  recording  the  speech 
or  EGG  signal.   This  undoubtedly  increased  the  potential  errors  in  the 
recording  system.   The  stereo  tape  recorders  were  run  at  7.5  ips  to 
obtain  a  flat  frequency  response  from  50  Hz  to  5  Khz.  This  problem  was 
solved  once  we  obtained  the  four  channel  Vetter  FM  recorder.  Having  all 
signals  including  the  timing  signal  on  one  tape  recorder  greatly  reduced 
many  potential  errors  both  at  the  recording  time  and  processing  time 
i.e.,  it  was  easier  to  locate  corresponding  signals,  variations  between 
instruments  are  eliminated,  etc.  The  Vetter  recorder  was  run  at  15  ips 
to  obtain  a  flat  frequency  response  from  0  Hz  to  4500  Hz  on  the  speech 
and  EGG  channel,  and  a  flat  frequency  response  from  200  Hz  to  40  Khz  on 
the  timing  signal  channel.   The  effects  of  bandpass  filtering  on  the 
timing  channel  were  corrected  by  the  addition  of  a  simple  Schmitt 
trigger  circuit.   This  additional  circuit  reconstructed  the  temporal 
square  shape  characteristics  of  the  timing  signal  which  is  needed  to 
externally  trigger  the  A/D  converter. 
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4.4  Data  Measurement  and  Preprocessing 

4.4.1  Speech  and  EGG  Signals 

The  10  Khz  signal  recorded  on  the  second  channel  of  both  tape 
recorders  was  used  as  the  external  sampling  clock  for  the  analog  to 
digital  (A/D)  converter.  The  A/D  converter  board  is  plugged  into  the 
backplane  of  a  Data  General  Nova  4  minicomputer.  The  A/D  is  capable  of 
sampling  two  channels  of  data  at  a  combined  sampling  rate  of  more  than 
20  Khz.  Due  to  the  limited  bandwidth  of  the  tape  recorders,  the  10  Khz 
timing  signal  was  passed  through  a  waveshaping  circuit  to  obtain  a  clean 
square  waveform.  However,  small  variations  in  the  tape  speed  and  jitter 
in  the  waveshaping  circuit  are  sufficient  to  introduce  errors  in 
synchronization  between  the  various  signals.  Using  the  FM  tape  recorder 
in  the  latter  part  of  the  study  eliminated  these  problems  to  a  great 
degree. 

Prior  to  digitization,  the  speech  and  EGG  signals  were  passed 
through  an  analog  lowpass  filter  with  a  cutoff  frequency  of  5  Khz.  Once 
the  EGG  and  speech  signal  are  digitized,  two  other  problems  have  to  be 
corrected.  The  two  problems  are:  the  tape  recorder  distortion,  and 
power  line  and  high  frequency  noise  component. 

Tape  recorder  distortion  results  from  the  capacitor  coupling 
normally  used  in  tape  recorders,  which  introduce  both  phase  and 
magnitude  distortion  in  the  low  frequency  region  below  200  Hz.  This 
distortion  is  not  visible  in  the  speech  waveform  but  manifests  itself  as 
a  downward  slope  in  the  EGG  signal  during  the  glottal  open  phase. 
Berouti  [21]  proposed  a  solution  to  this  problem.  Briefly,  the 
recording  and  playback  system  is  considered  to  be  a  linear,  time 
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invariant  system.  Let  the  Fourier  transform  of  this  system  transfer 
function  be  H(w).  We  can  derive  H(w)  using  a  reference  signal,  usually 
a  square  wave.  Let  D(w)  be  the  transform  of  the  recorded  and  played 
back  signal,  and  U(w)  the  Fourier  transform  of  the  original  and 
undistorted  signal,  then: 


u(w)  = 


_  D(w) 


w 


so  we  can  restore  U(w)  by  multiplying  D(w)  by  the  inverse  of  H(w).  In 
the  case  of  the  EGG  signal,  the  traced  EGG  off  the  film  is  used  as  the 
reference  signal.  Figure  4.3  illustrates  this  method  and  shows  examples 
of  the  EGG  before  and  after  correction.  The  use  of  the  FM  recorder 
eliminated  this  problem. 

The  60  Hz  power  line  noise  and  the  high  frequency  noise  were 
removed  from  the  speech  and  EGG  signals  using  a  bandpass  filter.  The 
filter  used  is  a  351  point,  linear  phase  FIR  filter  described  in  [74]. 
The  transfer  function  of  the  filter  is  shown  in  Figure  4.4.  Again,  most 
of  the  FM  tape  recordings  did  not  contain  these  noises  and  did  not  need 
filtering. 

4.4.2  High  Speed  Film  Data 

In  this  section  we  describe  the  method  used  to  measure  different 
parameters  associated  with  vibration  of  the  vocal  folds.  It  is 
important  to  note  that  these  measurements  are  done  in  a  region  of  high 
exposure  rate  of  the  film  frames.  In  our  study  the  pitch  frequency 
which  is  directly  related  to  the  frequency  of  vocal  fold  vibration  never 
exceeds  350  Hz.   We  were  able  to  attain  film  speeds  of  up  to  5000 
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Figure  4.3  Correction  system  for  tape  recorder  phase 
distortion. 
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frames/sec,  so  we  obviously  satisfy  the  sampling  theorem  and  should  have 
little  or  no  aliasing  in  the  measured  data. 

The  first  step  in  the  measurement  is  to  locate  the  region  of 
constant  5000  frames/sec.  Using  an  Athena  224-ES  stop  frame  projector, 
we  marked  the  segment  of  the  film  where  the  number  of  5  Khz  square  wave 
cycles  between  successive  8  bit  counter  output  was  100  cycles  or  close 
to  it.  This  represents  a  film  speed  of  approximately  5000  frames/sec. 
Starting  with  this  region,  150  frames  were  chosen  and  marked  for 
measurement. 

Next,  the  glottal  area  between  the  vocal  fold  image  on  the  film 

frame  was  digitized.    Over  the  years  a  number  of  semi  automated, 

computerized  systems  for  digitizing  the  glottal  area  have  been  developed 

and  used  at  the  Mind-Machine  Interaction  Research  Laboratory  at  the 

University  of  Florida  [72,73,75,76].   The  system  consists  of  a  Vidicon 

TV  camera  attached  to  a  Spatial  Data  Systems  EyeCom  108PT  image 

procesing  terminal.  This  terminal  is  interfaced  to  a  Data  General  Nova 

4  minicomputer  and  has  the  capability  of  displaying  video  images  along 

with  superimposed  graphics.  The  display  screen  is  divided  into  640x480 

coordinate  locations.   A  cursor,  controlled  by  the  joystick,  can  be 

moved  to  any  desired  location  on  the  screen.  The  cursor  coordinates  are 

then  transferred  to  the  computer.   Also,  the  terminal  is  capable  of 

digitizing  images  with  a  640x480  pixels  spatial  resolution  and  intensity 

resolution  of  256  gray  levels. 

Measurements  of  the  laryngeal  film  are  carried  out  using  the 
following  procedure:  using  an  Athena  224-ES  stop  frames  projector,  the 
region  of  constant  5000  frames/sec  is  first  located.  Each  frame  in  this 
region  is  projected  onto  a  45°  translucent  screen.   The  glottal  image 
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projected  on  this  screen  is  scanned  by  the  TV  camera  and  displayed  on 
the  EyeCom  display  terminal.  Using  a  joystick  cursor,  the  operator 
measures  the  glottal  width  at  five  preselected  locations  and  also  the 
length  of  the  glottis.  The  0.1  inch  grid  also  is  measured  and  stored 
for  scaling.  The  marked  five  points  on  the  image  are  connected  by  a 
straight  line  by  the  software  program  to  approximate  a  glottal 
boundary.  The  glottal  area  bounded  by  these  lines  is  computed  by  the 
program.  This,  no  doubt,  introduced  noise  in  the  measured  area,  length, 
and  glottal  width. 

The  EGG  trace  on  the  film  was  also  digitized  using  this  system. 
Two  points  are  measured  on  the  EGG  trace  for  every  frame.  The  EGG 
signal  obtained  from  this  procedure  is  referred  to  as  the  traced  EGG. 
Here,  also,  the  traced  EGG  signal  is  noisy  due  to  the  limited  spatial 
resolution  of  the  EyeCom  terminal. 

The  limited  resolution  of  the  image  system  and  the  procedure  used 
in  digitizing  the  various  signals  introduce  noise  in  these  measured 
signals,  as  mentioned  earlier.  Consequently,  these  measurements  have  to 
be  suitably  smoothed.  Linear  smoothing  techniques  alone  are  not 
suitable  for  preserving  important  abrupt  changes  in  the  signals.  In  our 
case,  many  of  these  signals  have  abrupt  or  sharp  transitions  due  to  the 
nature  of  the  vibrations  of  the  vocal  folds.  Hence,  a  combination  of 
nonlinear  median  smoothiny  and  linear  smoothiny  as  described  in  [77]  was 
used. 

4«5  Synchronization  of  the  Measured  Data 

Section  4.2.1  laid  out  the  theoretical  procedure  in  aligning  the 
different  measured  signals  using  the  recorded  and  photographed  timing 
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signals.  However,  snychronization  errors  persisted  between  the  measured 
signals.  These  are  primarily  due  to  the  sampling  errors  during  digiti- 
zation. However,  the  traced  EGG  obtained  from  the  films  is  in  perfect 
alignment  with  the  film  data.  We  can  utilize  this  feature  to  fine  tune 
synchronization  between  the  measured  data  using  the  following  procedure: 

(1)  The  digitized  EGG  from  the  tape  recorder  was  shifted  to 
align  with  the  traced  EGG  signal.  This  typically 
involved  shifts  of  less  than  ten  samples. 

(2)  The  recorded  EGG  and  speech  signals  are  assumed  to  be 
perfectly  aligned.  Therefore,  the  speech  signal  is 
shifted  by  the  same  number  of  samples  as  the  recorded 
EGG.  Moreover,  to  account  for  the  acoustic  propagation 
delay  from  the  glottis  to  the  microphone,  the  speech 
signal  is  further  shifted  by  four  samples. 

Perfect  alignment  between  the  speech  and  recorded 
EGG  is  valid  in  the  case  of  FM  recording.  However,  this 
assumption  is  probably  violated  when  using  stereo  tape 
recorders.  Furthermore,  the  compensation  for  the 
acoustic  delay  could  be  in  error  by  as  many  as  three 
samples. 

4.6  Error  Sources 

4.6.1  Film  Data 

There  are  three  primary  sources  of  errors  associated  with  the  data 
measured  from  the  high  speed  films: 

(1)  Incomplete  or  poor  exposure  of  the  vocal  folds— One 
always  desires  the  entire  folds  and  glottal  area  to  be 
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visible  on  the  film  frames.  However,  in  many  instances 
this  is  not  the  case.  Usually,  the  anterior  portion  of 
the  folds  are  overshadowed  by  the  epiglottis.  In  this 
situation,  the  procedure  was  not  to  include  the 
invisible  portion  by  extrapolating  the  glottal  contour 
over  this  portion.  This  procedure  was  carried 
throughout  the  entire  set  of  high  speed  films.  This,  of 
course,  introduces  a  systematic  error  throughout  the 
measured  film  data. 

(2)  Limited  resolution  of  the  measured  system— The 
relatively  poor  contrast  of  some  of  the  films  does  not 
allow  taking  full  advantage  of  the  display  terminal 
resolution,  which  is  itself  somewhat  limited. 

(3)  Operator  approximation  errors— Digitizing  high  speed 
films  is  a  formidable  task.  It  is  estimated  that  only  5 
to  6  minutes  (25000-30000  frames)  of  high  speed 
laryngeal  film  have  been  processed  in  the  entire 
world.   More  than  one  operator,  therefore,  is  usually 
required  to  perform  this  task.   In  our  system,  the 
operator  subjectively  locates  the  cursor  at  the  end 
points  of  the  opening  of  the  vocal  folds.    This 
introduces  variability  between  film  frames  measured  by 
different  operators.   Other  errors  are  discussed  more 
fully  in  [73]. 

4.6.2  Digitized  Tape  Recordings 

Initially,  two  stereo  tape  recorders  were  used  when  recording  both 
the  EGG  and  speech  signals.  This  introduced  the  following  errors: 
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(1)  Synchronization  errors  due  to  the  sampling  process. 

(2)  Errors  due  to  the  tape  recorder  distortion. 

(3)  Tape  recorder  speed  variation. 

The  traced  EGG  signal  was  used  to  correct  the  recorded  EGG  signal 
and  significantly  reduced  the  errors  in  the  EGG  signal.  On  the  other 
hand,  the  speech  signal  cannot  be  as  adequately  corrected. 

Using  the  Vetter  FM  tape  recorder  eliminated  the  first  two  problems 
but  not  the  third  one.  However,  the  FM  recorder  has  fairly  stable 
recording  speeds. 

4.7  Conclusions 

This  chapter  explained  our  data  collection  and  measurement 
system.  The  collected  data  base  includes  data  from  normal  and  abnormal 
subjects.  The  normal  data  base  consisted  of  thirty-six  tasks  performed 
by  four  subjects;  nine  tasks  per  subject.  The  data  base  includes 
digitized  recorded  speech  and  EGG  signal  and  traced  EGG  and  area 
function  measured  off  the  film.  The  patient  data  base  consists  of  eight 
tasks  performed  by  four  patients  (refer  to  Table  4.1). 

As  mentioned  earlier,  the  data  collection  and  high  speed 
photography  of  patient  subjects  proved  to  be  difficult.  Although  we 
filmed  more  patients  than  indicated  by  Table  4.1,  we  had  to  discard 
their  data  because  their  films  were  not  measurable.  The  discarded  films 
were  the  ones  with  poor  film  exposure  or  obstructions  to  the  folds' 
image  on  the  film  frames.  Also,  sources  of  error  in  the  digitized 
recorded  signals  and  signals  measured  off  the  film  were  discussed. 


CHAPTER  5 
MEASUREMENTS  OF  VOCAL  FOLD  VIBRATION  PARAMETERS 


5.1  Introduction 

One  important  aspect  of  this  research  is  to  provide  quantitative 
measurements  of  the  vocal  fold  vibratory  parameters.  These  parameters, 
listed  in  section  2.2.2.1  included  the  area  function,  the  fundamental 
frequency  FQ,  the  opening  and  closing  phase,  the  closed  phase,  the  open 
quotient  (OQ),  speed  quotient  (SQ),  and  the  speed  index  (SI).  The 
literature  usually  provides  only  a  qualitative  description  of  these 
parameters.  Rarely  does  one  find  quantitative  or  numerical  values 
associated  with  these  parameters.  In  this  chapter  we  will  provide 
actual  measurements  of  the  parameters  made  from  our  data  base.  We  also 
present  a  comparison  between  values  obtained  from  subjects  with  a  normal 
larynx  and  values  obtained  from  subjects  with  a  laryngeal  pathology. 

The  vibratory  parameters  are  usually  measured  from  the  area 
function.  In  Chapter  4  we  discussed  the  data  collection  and  measurement 
system  we  used  to  obtain  the  synchronized  area  function,  the  EGG,  and 
the  speech  signals.  It  is  clear  from  that  discussion  that  obtaining  the 
area  function  is  a  difficult,  expensive,  time-consuming,  and  invasive 
process,  requiring  extensive  effort,  subject  training,  and  special 
operator  expertise.  These  factors  have  precluded  this  system  from  being 
adopted  in  the  clinic.  However,  the  search  is  on  for  alternative 
methods  that  are  noninvasive,  simple,  inexpensive,  and  most  of  all, 
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convey  comparable  information  about  the  laryngeal  function  that  can  be 
used  in  a  clinical  setting. 

One  such  method  is  electroglottography  and  the  electroglottographic 
(EGG)  signal.  Using  data  collected  from  normal  subjects,  Krishnamurthy 
[11]  investigated  the  relationship  between  the  EGG  signal  and  the  vocal 
fold  vibratory  events,  providing  measurements  of  some  of  the  parameters 
discussed  above.  His  main  concern  was  to  show  that  events  in  the  EGG 
signal  can  be  correlated  with  certain  events  in  the  vibratory  cycle  of 
the  vocal  folds.  His  results  demonstrated  the  validity  of  that 
conjecture. 

In  this  research  we  use  the  same  data  base  for  the  normal  subjects, 
but  we  also  use  a  data  base  collected  from  subjects  with  pathologies. 
We  extend  Krishnamurthy 's  measurements  to  other  vocal  fold  vibratory 
parameters  and  provide  a  comparison  between  the  two  types  of 
populations,  namely,  subjects  with  a  normal  larynx  and  the  subjects  with 
a  laryngeal  pathology  from  the  clinical  population. 

This  chapter  is  organized  in  the  following  manner:  We  first 
discuss  the  various  vibratory  parameters  and  present  measurements  made 
from  our  data  base.  Then  we  discuss  the  problem  of  using  short  data 
records  versus  long  data  records  when  processing  the  EGG  signal. 

5.2  Vocal  Fold  Vibratory  Parameters 

In  Chapter  2  and  in  this  chapter  we  have  listed  some  typical  vocal 
fold  vibratory  parameters.  We  add  to  this  list  the  closing  time  of  the 
vocal  folds  as  measured  from  the  EGG  signal.  In  this  section  we  first 
present  an  overall  discussion  of  the  algorithms  used  to  process  the  area 
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and  EGG  signals  and  then  we  discuss  the  vibratory  parameters  and  present 
the  pertinent  results  for  each  parameter. 

Algorithms  for  measurements  of  vocal  fold  parameters.  A  number  of 
algorithms  were  developed  for  this  dissertation  to  perform  the  following 
measurements: 

(1)  vocal  fold  opening  and  closing  instants  as  measured  from  the 
EGG  and  laryngeal  films 

(2)  fundamental  (pitch)  period  measurements 

(3)  pitch  perturbation  measurements 

(4)  open  quotient  (OQ)  measurements 

(5)  speed  quotient  (SQ)  and  speed  index  (SI)  measurements  from  the 
area  function 

(6)  glottal  area  closing  time  (closing  phase)  and  opening  time 
(opening  phase) 

(7)  the  vocal  fold  closing  time  as  measured  from  the  EGG. 

The  first  four  measurements  were  carried  out  for  both  the  area  and 
EGG  functions.  However,  the  algorithm  for  measuring  the  same  parameters 
from  the  area  and  EGG  functions  are  different  due  to  differences  in  the 
two  signals. 

A  computer  program  was  developed  to  implement  the  above 
algorithms.  Figure  5.1  shows  the  flow  diagram  for  this  program  for 
measurements  from  the  area  function,  and  Figure  5.2  shows  the  flow 
diagram  for  measurements  from  the  EGG  function.  The  flow  diagrams 
reveal  the  interaction  between  the  different  algorithms.  Taking  the 
area  function  as  the  reference  signal,  the  difference  in  measurement 
results  between  the  area  and  EGG  functions  are  computed  and  presented  as 
errors  in  measurements  made  from  the  EGG  signal. 
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Figure  5.1  Flow  chart  for 
EGG  signal  processing. 


Figure  5.2  Flow  chart  for 
the  relative  entropy  method 
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We  will  discuss  the  details  of  some  of  the  algorithms  during  the 
discussion  of  the  corresponding  parameters.  The  rest  of  the  algorithms 
are  discussed  in  Appendix  A. 

5.2.1  Area  Function 

The  area  function  is  the  variation  of  the  glottal  area  with  respect 
to  time  during  the  vibration  of  the  folds.  This  is  the  most  important 
vibratory  parameter  (function),  so  much  so  that  it  is  usually  used  as  a 
means  of  describing  the  vibratory  characteristics  of  the  vocal  folds. 
Other  parameters  such  as  the  fundamental  frequency,  open  quotient,  speed 
quotient,  speed  index,  and  the  closing  time  (closing  phase)  are  usually 
measured  from  the  area  function.  Also,  other  important  events,  such  as 
the  instants  of  glottal  closure  and  of  glottal  opening,  are  identified 
in  the  glottal  cycle  using  the  area  function. 

As  mentioned  earlier,  obtaining  the  area  function  is  difficult  and 
hard  to  implement  in  a  clinical  setting.  One  goal  of  this  research  is 
to  facilitate  the  use  of  the  EGG  signal  as  a  substitute  for  the  area 
function.  However,  we  should  always  remember  that  the  EGG  signal  and 
the  area  function  are  not  equivalent,  but  our  objective  is  to  obtain 
information  from  the  EGG  and  extract  additional  information  about  the 
vibratory  behavior  of  the  folds  during  the  time  they  are  touching  each 
other,  i.e.,  during  the  closed  phase. 

The  area  function  of  the  glottal  cycle  can  be  divided  into  three 
regions:  opening  phase,  closing  phase,  and  closed  phase.  The  instants 
of  opening  and  closing  of  the  folds  are  two  important  events  which 
affect  source-tract  coupling,  the  formants,  and  their  bandwidths.  These 
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two  events  are  easily  identified  from  the  area  function  if  a  closed 
phase  exists.  However,  each  of  these  events  corresponds  to  a  region  in 
the  EGG  signal  instead  of  a  single  point  as  in  the  area  function.  In 
the  following  sections  we  will  discuss  a  procedure  suggested  by 
Krishnamurthy  [11]  to  detect  these  two  events  from  the  EGG  signal. 

5.2.1.1  Glottal  opening  and  closing  instants 

These  two  events  are  clearly  identified  in  the  area  function, 
namely,  the  closing  instant  is  the  moment  the  area  function  becomes  zero 
at  the  end  of  the  closing  phase  region.  The  opening  instant  is  the 
moment  the  area  function  increases  to  a  value  greater  than  zero 
following  the  closed  phase  region. 

The  beginning  of  the  closed  phase  region  corresponds  to  the  region 
of  the  rapid  fall  in  the  EGG  signal  as  shown  in  Figure  4.1.  Although  it 
is  not  possible  to  pinpoint  the  instant  of  closure  of  the  folds  in  the 
EGG  signal,  it  is  possible  to  estimate  this  point  with  a  high  degree  of 
accuracy.  We  will  present  data  and  measurements  that  substantiate  this 
claim.  A  method  for  estimating  the  instant  of  closure  was  suggested  by 
Krishanmurthy  [11].  He  proposed  using  the  minimum  in  the  differentiated 
EGG  (DEGG)  signal  to  mark  the  moment  of  closure  of  the  folds. 
Rothenberg  and  Mahshie  [78]  suggested  high  pass  filtering  the  EGG  signal 
first,  then  taking  the  50%  amplitude  of  the  flat  top  of  the  EGG  on  both 
sides  (opening  and  closing)  to  denote  the  instants  of  opening  and 
closure.    We  chose  the  differentiated  EGG  method  as  a  result  of 
observations  from  our  ultra  high  speed  films.   The  change  in  the  EGG 
that  is  associated  with  the  minimum  in  the  DEGG  often  coincides  with  the 
moment  of  closure.   For  a  large  amplitude  EGG,  both  methods  are  almost 
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identical;  they  differ  when  the  EGG  has  a  small  amplitude,  which  is 
usually  sinusoidal  in  shape.  Chest  (modal)  voice  register  usually 
provides  a  large  amplitude  EGG.  The  tasks  recorded  in  our  data  base 
were  conducted  using  modal  voice  phonation,  so  we  choose  the 
differentiated  EGG  method  throughout  our  analysis.  Krishnamurthy's 
algorithm  for  detecting  the  opening  and  closing  instants  using  both  the 
area  and  EGG  signals  can  be  found  in  [11]. 

Measurement  results.   Chapter  4  discussed  the  data  collection  and 
measurement  system.   We  pointed  out  then  that  small  measurement  errors 
exist  in  the  measured  values  of  the  area,  EGG,  and  speech  signals. 
Also,  the  frequency  headings  of  125,  170,  and  340  Hz  are  the  target 
frequencies  that  the  subjects  tried  to  produce.  However,  using  the  area 
function  as  a  reference  signal,  we  measured  the  instants  of  glottal 
closure  and  the  instants  of  glottal  opening  from  the  EGG  using  the 
differentiated  EGG  method  outlined  earlier.   Tables  5.1  to  5.3  and  5.5 
to  5.7  present  the  error  in  samples  between  values  measured  from  the 
area  function  and  the  EGG  for  normal  subjects  and  Tables  5.4  and  5.8 
presents  the  error  for  measurements  made  from  data  taken  from  subjects 
with  pathologies.   Tables  5.1  to  5.3  show  that  the  method  performed 
fairly  well   when  measuring  the  instants  of  closure  for  normal 
subjects.   At  low  fundamental  voicing  frequencies  (125  Hz)  the  error 
seems  to  increase  with  increasing  intensities.   However,  for  medium  to 
high  fundamental  frequencies  (170  Hz,  340  Hz)  the  error  is  least  at 
medium  intensities  (70  dB  nominal).   The  overall  average  error  is  2.3 
samples  or  0.23  msecs.  This  is  also  true  for  subjects  with  pathologies, 
where  the  average  error  is  approximately  2.6  samples,  and  1.4  samples  if 
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Table  5.1  Error  in  determining  instant  of  closure  from  the 
EGG  signal  for  subjects  with  a  normal  larynx  in 
number  of  samples  (sample  =  .1  msec) 


♦Int. 

Low 

*Freq. 
*Subj. 

125 

170 

340 

JMN 

1.00 

2.00 

NA 

DMK 

1.00 

3.48 

1.60 

AKK 

0.67 

0.75 

2.40 

GPM 

NA 

4.00 

2.00 

Average 

0.89 

2.56 

2.00 

Table  5.2  Error  in  determining  instant  of  closure  from  the 
EGG  signal  for  subjects  with  a  normal  larynx  in 
number  of  samples  (sample  =  .1  msec) 


*Int. 

Medium 

*Freq. 
*Subj. 

125 

170 

340 

JMN 

2.25 

3.50 

NA 

DMK 

3.50 

0.00 

2.10 

AKK 

1.00 

2.86 

NA 

GPM 

NA 

NA 

0.90 

Average 

2.25 

2.12 

1.50 

Int.  =  Intensity;  Freq.  =  Frequency;  Subj.  =  Subject 
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Table  5.3  Error  in  determining  instant  of  closure  from  the 
EGG  signal  for  subjects  with  a  normal  larynx  in 
number  of  samples  (sample  =  .1  msec) 


Int. 


Subj. 


Freq. 


JMN 
DMK 
AKK 
GPM 


Average 


125 


1.67 
1.25 
0.50 
7.56 


2.75 


High 


170 


3.40 
2.20 

NA 
6.00 


3.86 


340 


NA 

NA 

1.90 

3.71 


2.80 


Table  5.4  Error  in  determining  instant  of  closure  from  the 
EGG  signal  for  subjects  with  a  laryngeal  pathology 
in  number  of  samples  (sample  =  .1  msec) 


Subj . 


Error 


DJB 


LMB 


MXR 


Task        A       B      C     B     A 
(Freq.)     (284)    (438)   (172)   NA    NA 


1.50     7.61    0.60    NA 


NA 


GTS 


B       A     B 
(151)    (169)   NA 


1.50     2.00    NA 
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Table  5.5  Error  in  determining  opening  instant  from  the  EGG 
signal  for  subjects  with  a  normal  larynx  in  number 
of  samples  (sample  =  .1  msec) 


Int. 


Sub  j . 

JMN 
DMK 
AKK 
GPM 


Freq. 


Average 


125 


4.00 
4.33 
11.50 
NA 


6.61 


Low 


170 


3.20 
3.80 
7.60 
4.25 


4.71 


340 


NA 
3.60 
12.25 
4.10 


6.65 


Table  5.6  Error  in  determining  opening  instant  from  the  EGG 
signal  for  subjects  with  a  normal  larynx  in  number 
of  samples  (sample  =  .1  msec) 


Int. 


Subj. 


Freq. 


Medium 


125 


170 


340 


JMN 

2.00 

14.20 

NA 

DMK 

2.00 

3.20 

2.60 

AKK 

11.40 

4.83 

NA 

GPM 

NA 

NA 

3.10 

Average 

5.13 

7.41 

2.85 
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Table  5.7  Error  in  determining  opening  instant  from  the  EGG 
signal  for  subjects  with  a  normal  larynx  in  number 
of  samples  (sample  =  .1  msec) 


Int. 


Subj. 


Freq. 


JMN 
DMK 
AKK 
GPM 


Average 


125 


4.81 


High 


170 


6.25 

3.40 

3.00 

0.60 

6.50 

NA 

3.49 

5.28 

3.09 


340 


NA 

NA 

7.30 

1.14 

4.22 


Table  5.8  Error  in  determining  opening  instant  from  the  EGG 
for  subjects  with  a  laryngeal  pathology  in  number 
of  samples  (sample  =  .1  msec) 


Subj. 


Error 


DJB 


LMB 


8.50 


19.16   19.40   NA 


MXR 


Task        A       B      C     B     A 
(Freq.)     (284)    (438)   (172)   NA    NA 


NA 


GTS 


B       A     B 
(151)    (169)   NA 

11.25    2.60    NA 
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we 


exclude  task  B  of  subject  DJB  in  Table  5.4.  Since  it  was  difficult 
to  control  the  intensity  and  frequency  of  phonations  for  subjects  with 
pathologies,  the  phonations  were  acquired  at  comfortable  frequencies  and 
intensities. 

Tables  5.5  to  5.7  indicate  that  the  error  in  measuring  the  opening 
instants  for  normal  subjects  is  least  for  low  to  medium  fundamental 
frequencies  at  high  intensities  with  an  average  error  of  4  samples  (0.4 
msecs).  For  high  frequencies  the  error  is  least  at  medium  to  high 
intensities  with  an  average  error  of  3.5  samples  (0.35  msecs).  The 
overall  average  error  is  approximately  5  samples  (0.5  msec).  Table  5.8 
shows  that  for  subjects  with  pathologies  the  method  did  not  perform  well 
with  an  average  error  of  12  samples  (1.2  msecs). 

In  conclusion,  the  method  performs  rather  well  for  subjects  with 
normal  larynx  and  for  subjects  with  an  abnormal  larynx  when  measuring 
the  instant  of  closure,  but  not  so  well  when  measuring  the  instant  of 
opening.  This  conclusion  agrees  with  the  measurements  made  by 
Krishnamurthy  [11]. 

5.2.1.2  Opening  phase 

The  opening  phase  is  a  region  defined  in  the  area  function  as  the 
duration  from  the  instant  of  glottal  opening  to  the  maximum  of  the 
glottal  area  function.  Observations  made  from  ultra  high  speed 
laryngeal  films  reveal  that,  in  this  region,  the  folds  abduct,  moving 
away  from  each  other.  The  films  also  show  that,  during  glottal  opening, 
the  folds  separate  with  a  phase  difference  along  their  length  and 
thickness.  Once  separated,  the  folds  continue  to  move  apart  until  they 
reach  a  maximum  displacement.  In  the  area  function  this  corresponds  to 
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the  opening  phase,  where  the  area  function  increases  monotonically  until 
it  reaches  its  maximum. 

The  EGG  signal  increases  monotonically  during  the  time  the  folds 
are  separating.  Once  the  folds  separate,  however,  the  EGG  stops 
increasing  and  stays  constant  until  the  folds  make  contact  again.  These 
observations  support  the  conjecture  that  the  EGG  is  a  measure  of  the 
contact  area  of  the  folds.  Also,  our  model  presented  in  Chapter  3 
substantiates  such  claims. 

Measurement  results.  Tables  5.9  to  5.11  list  the  measurements  from 
the  area  function  of  the  time  of  the  opening  phase  for  subjects  with  a 
normal  larynx  at  different  frequencies  and  intensities  of  voicing.  As 
expected,  the  opening  phase  period  is  inversely  proportional  to  the 
vibratory  frequency  of  folds.  In  the  case  of  low  intensity  phonations 
the  opening  phase  duration  ranges  from  an  average  of  2.36  msec  at  Fn  = 
125  Hz  to  an  average  of  1.56  msec  at  FQ  =  340  Hz.  For  medium 
intensities  the  opening  phase  ranges  from  an  average  of  1.87  msecs  at  Fn 
=  125  Hz  to  an  average  of  1.05  msecs  at  FQ  =  340  Hz,  and  for  high 
intensities,  the  opening  phase  duration  ranges  from  an  average  of  2.39 
msecs  at  FQ  =  125  Hz  to  an  average  of  1.26  msecs  at  FQ  =  340  Hz. 

At  medium  intensities  the  average  duration  of  the  opening  phase 
across  subjects  stays  almost  constant  when  the  fundamental  frequency  of 
phonation  is  increased  from  low  to  medium  range.  In  contrast,  a 
substantial  change  in  the  duration  of  the  opening  phase  occurs  when 
going  from  a  medium  frequency  to  a  high  frequency  of  phonation.  At  low 
intensity  the  opposite  is  true,  a  large  change  in  the  duration  of  the 
opening  phase  occurs  when  going  from  a  low  to  a  medium  frequency  of 


1.07 


Table  5.9  Opening  phase  duration  measured  in  msecs  from  area 
function  for  subjects  with  a  normal  larynx  for  low 
intensity  phonation 


Int. 


Subj. 


Freq. 


JMN 
DMK 
AKK 
GPM 


Average 


Low 


125 


2.33 
2.80 
1.96 

NA 


2.36 


170 


1.15 
1.75 
1.80 
1.75 


1.61 


340 


NA 
0.96 
1.96 
1.76 


1.56 


Table  5.10  Opening  phase  duration  measured  in  msecs  from  area 
function  for  subjects  with  a  normal  larynx  for 
medium  intensity  phonation 


Int. 


Subj. 


Freq. 


JMN 
DMK 
AKK 
GPM 

Average 


125 


Medium 


170 


340 


1.60 

2.35 

NA 

2.00 

2.73 

NA 

2.00 

1.20 

2.05 

NA 

NA 

1.16 

1.87 

2.09 

1.05 
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Table  5.11  Opening  phase  duration  measured  in  msecs  from  the  area 
function  for  subjects  with  a  normal  larynx  for  high 
intensity  phonation 


Int. 


Subj. 


Freq. 


JMN 
DMK 
AKK 
GPM 


Average 


High 


125 


2.80 
2.73 
1.64 

NA 


2.39 


170 


1.15 
2.05 

NA 
2.50 


1.90 


340 


NA 

NA 
1.20 
1.31 


1.26 


Table  5.12  Opening  phase  duration  measured  in  msecs  from  the  area 
function  for  subjects  with  a  laryngeal  pathology  for 
for  comfortable  intensity  phonation 


Subj. 


Area 


DJB 


LMB 


MXR 


GTS 


Task  A  BCBA  B  AB 

(Freq.)  (284)  (438)        (172)       NA  NA  (151)  (169)       NA 


Opening     1.31     1.10    2.70    NA    NA    2.13    2.00    NA 
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phonation,  while  a  small  change  occurs  when  going  from  a  medium  to  a 
high  frequency  of  phonation.  At  a  high  intensity  phonation  the  change 
seems  to  be  uniform  with  almost  equal  steps  when  varying  from  a  low  to 
medium  and  a  medium  to  high  frequency  of  phonation.  Table  5.12  presents 
measurements  from  subjects  with  a  laryngeal  pathology.  Again  the 
average  duration  of  the  opening  phase  is  inversely  proportional  to  the 
frequency  of  phonation. 

5.2.1.3  Closing  phase 

This  region  is  defined  as  the  duration  from  the  maximum  of  the 
glottal  area  function  to  zero  glottal  area,  i.e.,  the  instant  of  glottal 
closure. 

Krishnamurthy  [11]  noted  that  although  the  area  function  appears  to 
be  almost  symmetric,  the  movement  of  the  folds  is  asymmetric.   As 
mentioned  earlier,  during  the  opening  phase,  a  phase  difference  along 
the  length  and  depth  of  the  folds  exists  until  the  folds  separate. 
However,  during  the  closing  phase,  the  folds  move  towards  each  other 
until  they  are  almost  parallel  with  a  narrow  opening  along  their 
lengths,  then  "closure  occurs  almost  simultaneously  along  the  entire 
mid-sagittal  line."  The  closing  phase  region  follows  the  opening  phase 
region.   The  EGG  assumes  a  constant  value  after  the  folds  separate 
during  the  opening  phase,  and  maintain  this  value  during  the  closing 
phase  until  the  folds  make  contact  again.  The  EGG  exhibits  a  rapid  fall 
due  to  the  almost  simultaneous  contact  along  the  length  of  the  folds. 
This  observation  adds  more  evidence  to  the  notion  that  the  EGG  is  a 
measure  of  the  contact  area  of  the  vocal  folds.   This  observation  is 
also  confirmed  by  our  model  in  Chapter  3. 
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Measurement  results.  Tables  5.13  to  5.15  present  measurements  from 
normal  subjects  with  a  normal  larynx  at  different  frequencies  and 
intensities.   As  expected,  the  duration  of  the  closing  phase  varies 
inversely  with  the  frequency  of  phonation.   More  interestingly,  the 
duration  of  the  closing  phase  decreases  from  2.45  msecs  at  FQ  =  125  Hz 
and  low  intensity  phonation,  to  2.18  and  2.06  msecs  at  125  Hz  with 
medium  and  high   intensity  phonations,   respectively.    The  same 
observations  are  valid  for  other  frequencies.   This  clearly  indicates 
the  relationship  between  the  duration  of  the  closing  phase  and  the 
intensity  of  the  produced  sound.   For  subjects  with  a  laryngeal 
pathology,  Table  5.16  does  not  reveal  any  specific  relationship  or 
trend.  The  relationship  between  the  duration  of  the  closing  and  opening 
phase  is  discussed  in  the  section  on  the  speed  quotient  and  the  speed 
index. 

5.2.1.4  Closed  phase 

This  region  follows  the  region  of  the  closing  phase.  It  is  defined 
as  the  duration  between  the  instant  of  glottal  closure  and  the  instant 
of  glottal  opening.   The  rapid  fall  in  the  EGG  signal  indicates  the 
onset  of  this  region  since  the  glottal  closure  occurs  almost 
instantaneously  along  the  entire  length  of  the  vocal  folds.   This  is 
also  evident  in  our  EGG  simulation  using  our  model  discussed  in  Chapter 
3.  The  EGG  in  this  region  is  usually  not  constant,  after  the  region  of 
rapid  fall  in  the  EGG,  the  EGG  continues  to  decrease,  but  at  a  slower 
rate  until  it  reaches  a  minimum.  We  believe  that  at  this  minimum  point 
the  lateral  contact  area  is  maximum.   The  EGG  starts  to  increase  while 
the  folds  are  still  in  contact  along  their  entire  length,  indicating 
that  the  folds  are  separating  along  the  inferior  margins. 
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Table  5.13  Closing  phase  measured  in  msecs  from  glottal  area 
for  subjects  with  a  normal  larynx  for  low 
intensity  phonation 


Int. 


Subj . 


Freq. 


JMN 
DMK 
AKK 
GPM 


Average 


125 


2.66 

NA 
1.88 

NA 


2.45 


Low 


170 


2.10 
2.00 
2.07 
1.80 


1.99 


340 


NA 
1.31 
1.76 
1.02 


1.36 


Table  5.14  Closing  phase  measured  in  msecs  from  glottal  area 
for  subjects  with  a  normal  larynx  for  medium 
intensity  phonation 


Int. 


Medium 


Subj. 


Freq. 


125 


170 


340 


JMN 
DMK 
AKK 
GPM 


2.13 
2.53 
1.89 

NA 


1.85 

1.40 

1.47 

NA 


NA 
1.33 

NA 
1.31 


Average 


2.18 


1.57 


1.32 
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Table  5.15  Closing  phase  measured  in  msecs  from  glottal  area 
for  subjects  with  a  normal  larynx  for  high 
intensity  phonation 


Int. 


Subj . 


Freq. 


JMN 
DMK 
AKK 
GPM 


Average 


125 


2.50 
2.13 
1.56 

NA 


2.06 


High 


170 


2.00 

1.35 

NA 

2.10 


1.82 


340 


NA 

NA 

1.47 

0.89 


1.18 


Table  5.16  Closing  phase  measured  in  msecs  from  glottal  area 
for  subjects  with  a  laryngeal  pathology  for 
comfortable  intensity  phonation 


Subj. 


DJB 


LMB 


MXR 


Task        A       B      C     B     A 
(Freq.)     (284)    (438)   (172)   NA    NA 


Area 

Closing    0.83    0.98    1.75 

Time 


NA 


NA 


3.07 


GTS 


B       A     B 
(151)    (169)   NA 


1.30 


NA 
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The  discussion  above  points  out  an  important  feature  of  the  EGG 
signal.  It  is  clear  that  the  EGG  signal  is  very  sensitive  to  events 
occurring  during  glottal  closure.  These  events  cannot  be  observed  by 
other  methods.  In  Chapter  6  we  give  an  example  of  an  EGG  signal 
produced  by  a  subject  with  vocal  nodules.  The  EGG  waveshape  is 
dramatically  altered  due  to  the  presence  of  the  nodules. 

The  opening  and  closing  phases  of  the  area  function  are  usually 
asymmetric.  The  degree  of  asymmetry  depends  on  the  frequency  and  the 
intensity  of  vibration  of  the  folds.  The  relationship  between  the  two 
phases  is  important  and  is  expressed  as  a  parameter  called  the  speed 
quotient  (SQ).  We  will  discuss  this  parameter  more  fully  and  provide 
measurements  of  this  parameter  later  in  our  discussion. 

5.2.2  Speed  Quotient  and  Speed  Index 

Let   SQ   and   SI   denote   speed   quotient   and   speed   index, 
respectively.  These  parameters  are  defined  from  the  area  function  as: 

duration  of  opening  phase 

SQ  = : — ; (5-d 

duration  of  closing  phase 
and 

SQ  -  1 

SI=^TT  (5-2» 

substituting  for  SQ  in  equation  (2)  by  its  value  in  equation  (1)  we  can 
derive  an  expression  for  SI  in  terms  of  the  opening  and  closing  phase: 

duration  of  opening  phase  -  duration  of  closing  phase 

SI ~ ■ — (5.3) 

duration  of  opening  phase  +  duration  of  closing  phase 
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The  durations  of  the  opening  phase  and  closing  phase  are  related  to 
the  position  of  the  folds  within  the  larynx.  During  these  phases  the 
folds  usually  do  not  touch  each  other,  and  the  EGG  takes  on  an  almost 
constant  value.  Both  regions  (opening  and  closing)  in  the  area 
function,  therefore,  are  represented  or  mapped  into  one  region  in  the 
EGG  signal,  reflecting  one  of  the  differences  between  the  EGG  and  the 
area  function. 

It  is  clear  from  this  discussion  that  it  is  not  possible  to 
measure,  with  any  accuracy,  the  duration  of  the  opening  or  closing  phase 
from  the  EGG  or  DEGG  signal.  Consequently,  the  SQ  and  the  SI  cannot  be 
calculated.  However,  based  on  extensive  observations  of  the  available 
data,  we  note  that  the  maximum  in  the  area  function  tends,  as  expected, 
to  align  with  the  maximum  in  the  EGG  function.  This  observation  could 
be  used  as  the  basis  for  an  elaborate  algorithm  that  measures  or 
calculates  the  durations  of  the  opening  and  closing  phase  from  the  EGG 
or  DEGG  functions.  This  will  be  left  for  future  investigations. 

Measurement  results.  Tables  5.17  to  5.19  show  the  measurements  for 
subjects  with  a  normal  larynx.  From  the  tables,  the  SQ  increases 
directly  with  the  intensity  and  inversely  with  the  frequency.  The 
average  SQ  increased  from  0.97  at  FQ  =  125  Hz  and  low  intensity  to  1.18 
at  FQ  =  125  Hz  and  high  intensity,  the  same  is  true  for  the  other 
frequencies.  Also,  the  SQ  decreased  from  0.97  at  FQ  =  125  Hz  and  low 
intensity  to  0.87  at  FQ  =  340  Hz  and  low  intensity.  The  same 
observation  is  true  for  other  intensities.  This  confirms  the 
conclusions  reached  by  Hildebrand  [39].  Table  5.20  shows  the 
measurements  from  subjects  with  a  laryngeal  pathology.  Although  we  did 
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Table  5.17  Speed  quotient  measurements  from  the  area  function 
for  subjects  with  a  normal  larynx  for  low  intensity 


Int. 


Subj. 


Freq. 


JMN 
DMK 
AKK 
GPM 


Average 


125 


0.88 
0.98 
1.05 

NA 


0.97 


Low 


170 


0.56 
0.88 
0.88 
0.99 


0.82 


340 


NA 
0.74 
1.13 
0.74 


0.87 


Table  5.18  Speed  quotient  measurements  from  the  area  function 

for  subjects  with  a  normal  larynx  for  medium  intensity 


Int. 


Subj. 


Freq. 


JMN 
DMK 
AKK 
GPM 


125 


0.75 
0.79 
1.07 

NA 


Medium 


170 


340 


1.33 

NA 

1.06 

0.71 

0.84 

NA 

NA 

0.97 

Average 


0.87 


1.07 


0.84 
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Table  5.19  Speed  quotient  measurements  from  the  area  function 

for  subjects  with  a  normal  larynx  for  high  intensity 


Int. 


Subj . 


Freq. 


JMN 
DMK 
AKK 
GPM 


Average 


125 


1.12 
1.28 

1.14 

NA 


1.18 


High 


170 


0.58 

1.59 

NA 

1.19 


1.12 


340 


NA 

NA 

0.85 

1.51 

1.18 


Table  5.20  Speed  quotient  measurements  from  the  area  function 
for  subjects  with  a  laryngeal  pathology 


Subj. 


DJB 


LMB 


MXR 


Task        A       B      C    B     A 
(Freq.)     (284)    (438)   (172)   NA    NA 


GTS 


B       A    B 
(151)    (169)   NA 


ERROR 


1.60 


1.13    1.55 


NA 


NA 


0.71 


1.57 


NA 
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not  have  control  over  the  intensity,  the  SQ  is  in  general  inversely 
proportional  to  the  frequency.  Also,  the  values  for  SQ  for  subjects 
with  a  laryngeal  pathology  are  somewhat  greater  than  for  subjects  with  a 
normal  larynx  indicating  more  asymmetry  in  the  vibratory  patterns  of  the 
folds  for  subjects  with  a  vocal  fold  pathology. 

The  speed  index  (SI)  is  computed  directly  from  the  speed  quotient 
using  equation  (2).  Theoretically,  the  SQ  as  defined  in  equation  (2) 
could  vary  from  0  to  »,  whereas  the  SI  has  a  range  of  -1  to  +1. 
Although  the  relationship  is  nonlinear  between  the  SQ  and  the  SI,  the 
speed  index  still  gives  the  same  results  as  the  SQ.  For  example,  the 
average  SI  increased  from  -0.025  at  FQ  =  125  Hz  and  low  intensity,  to 
0.08  at  F0  =  125  Hz  and  high  intensity.  This  is  also  true  at  other 
frequencies.  The  SI  decreased  from  -0.025  at  FQ  =  125  Hz  at  low 
intensity  to  -0.12  at  FQ  =  340  Hz  and  low  intensity.  This  is  also  true 
at  other  intensities. 

Tables  5.21  to  5.23  show  the  SI  measurements  from  normal  subjects 
and  Table  5.24  shows  the  results  from  subjects  with  pathologies.  It  is 
interesting  to  note  that  the  magnitude  of  the  SI  values  are  consistently 
larger  for  subjects  with  a  laryngeal  pathology  than  for  subjects  with  a 
normal  larynx. 

5.2.3  Fundamental  Frequency 

This  is  one  of  the  more  important  parameters  of  the  vibrating 
folds.  It  may  be  measured  from  the  area  function  as  the  inverse  of  the 
time  between  two  similar  events  in  the  glottal  cycle.  Traditionally 
these  events  are  either  the  opening  or  closing  instants.   However,  in 
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Table  5.21  Speed  index  measurements  from  the  area  function 

for  subjects  with  a  normal  larynx  for  low  intensity 


Int. 


Subj . 


Freq. 


JMN 
DMK 
AKK 
GPM 


Average 


125 


■0.07 

NA 
0.02 

NA 


0.03 


Low 


170 


■0.11 


340 


•0.29 

NA 

■0.07 

-0.16 

0.07 

-0.05 

0.01 

-0.15 

•0.12 


Table  5.22  Speed  index  measurements  from  the  area  function 

for  subjects  with  a  normal  larynx  for  medium  intensity 


Int. 


Subj. 


Freq. 


JMN 
DMK 
AKK 
GPM 


125 


-0.14 

-0.12 

0.03 

NA 


Medium 


170 


0.14 

0.27 

•0.09 

NA 


340 


NA 
-0.17 

NA 
■0.07 


Average 


-0.07 


0.11 


-0.12 
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Table  5.23  Speed  index  measurements  from  the  area  function 

for  subjects  with  a  normal  larynx  for  high  density 


Int. 


Subj. 


Freq. 


JMN 
DMK 
AKK 
GPM 


Average 


125 


0.06 
0.12 
0.07 

NA 


0.08 


High 


170 


-0.27 
0.23 

NA 
0.09 


0.02 


340 


NA 

NA 

-0.08 

0.20 


0.06 


Table  5.24  Speed  index  measurements  from  the  area  function 
for  subjects  with  a  laryngeal  pathology 


Subj. 


DJB 


LMB 


MXR 


Task        A       B      C    B     A 
(Freq.)     (284)    (438)   (172)   NA    NA 


Error 


0.23    0.06    0.22    NA 


NA 


GTS 


B       A     B 
(151)    (169)   NA 


-0.18    0.22    NA 
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many  instances,  the  glottal  cycle  does  not  have  a  complete  closure,  and 
in  such  cases  similar  points  on  the  area  function  or  EGG  signal  can  be 
used  to  measure  the  fundamental  frequency  of  vibration. 

In  section  5.2.1.1  we  showed  that  the  opening  and  closing  instants 
can  be  measured  accurately  from  the  EGG  signal,  and  specifically  from 
the  differentiated  EGG.  The  minima  in  the  DEGG  were  associated  with  the 
closing  instant  and  the  maxima  of  the  DEGG  with  the  opening  instant. 

Letting  the  time  between  the  two  successive  minima  be  TQ,  then  the 
fundamental  frequency  FQ  is 

1 

'0 

Measurement  results.  Tables  5.25  to  5.27  list  the  values  for  the 
fundamental  (pitch)  frequency  from  the  area  function  and  the  EGG.  The 
values  measured  from  the  area  function  are  taken  as  the  reference 
values.  The  values  measured  from  the  EGG  signal  are  compared  to  the 
reference  values  and  the  deviations  are  computed  as  a  percent  error  and 
listed  in  the  last  row  of  each  table. 

It  is  evident  from  the  tables  that  the  pitch  frequencies  measured 
from  the  EGG  signal  are  yery  accurate  over  different  ranges  of 
frequencies  and  intensities.  The  error  is  typically  less  than  1%  as 
compared  with  those  values  obtained  from  the  area  function.  The 
presence  of  a  pathology  does  not  seem  to  affect  the  accuracy  of  the 
pitch  measurements  from  the  EGG,  as  can  be  seen  from  Table  5.28. 

In  the  next  chapter  we  will  discuss  other  parameters  associated 
with  the  pitch  frequency  that  are  thought  to  indicate  the  presence  of  a 
pathology. 
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Table  5.25  Fundamental  frequency  measurements  for  subjects  with 
a  normal  larynx  in  Hz  (target  frequency  =  125  Hz) 


Subj .        JMN  DMK 


AKK  GPM 


Int.    Low   Med   High   Low   Med   High   Low   Med   High   Low   Med   High 


Area    123.0  125.0  120.5  117.2  117.2  114.5  186.6  198.0  189.4  183.5  161.3  141.6 


EGG     123.4  125.0  121.2  117.6  116.7  115.8  186.6  200.0  190.1   142.9  176.4  143.5 


XError 


°-3   0-0   1.0    0.3   0.4   1.1    0.0   1.0   0.4   22.1   9.4   1.3 


Table  5.26  Fundamental  frequency  measurements  for  subjects  with 
a  normal  larynx  in  Hz  (target  frequency  =  170  Hz) 


Subj.        JMN  DMK  AKK 


GPM 


Int.    Low   Med   High   Low   Med   High   Low   Med   High   Low   Med   High 


Area    163.9  165.3  172.4  163.4  161.3  162.6  156.2  214.3  NA    161.3  175.4  182.0 


EGG     163.4  163.0  170.4  163.9  161.3  161.9  156.2  214.3  NA    159.4  169.5  189.6 


XError    0.4   1.4   1.2    0.3   0.0   0.4    0.0   0.0  NA      1.2   3.4   4.2 


NOTE:  Subjects  attempted  to  phonate  at  fundamental  frequencies 
equal  to  the  target  frequencies.  However,  in  some  cases 
they  were  not  successful  as  portrayed  by  subject  GPM  for 
the  125  Hz  target  frequency. 
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Table  5.27  Fundamental  frequency  measurements  for  subjects  with 
a  normal  larynx  in  Hz  (target  frequency  =  340  Hz) 


Subj. 

JMN 

DMK 

AKK 

GPM 

Int. 

Low 

Med 

High 

Low 

Med 

High 

Low 

Med 

High 

Low 

Med 

High 

Area         277.8  330.6  346.3     333.3  344.8  336.1     203.6  202.0  330.9     321.4  330.9  321.1 


EGG  320.8  332.9  347.58  333.3  334.8  333.3     204.1   203.1   333.3     320.3  332.1   321.1 


XError       13.4       0.4       0.4         0.00     2.9       0.8         0.3       0.5       0.7         0.4       0.4       0.0 


Table  5.28  Fundamental  frequency  measurements  for  subjects  with  a 
laryngeal  pathology  from  the  EGG  signal  and  the  area 
function 


Subj.        DJB  LMB  MXR  GTS 


Task 


Area    284.7    438.8     172.5      NA       NA    151.5     169.6      NA 


EGG     284.7    435.3     171.7      NA       NA    153.1     170.3      NA 


SError    0.0     0.7      0.4      NA       NA     1.0      0.4      NA 
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5.2.4  Open  Quotient  (OQ) 

The  open  quotient  is  the  ratio  of  the  duration  the  vocal  folds  are 
open  (open  phase)  to  the  duration  of  the  entire  glottal  cycle 

duration  of  open  phase 
OQ  = 


duration  of  entire  glottal  cycle 


Traditionally,  this  parameter  is  measured  from  the  area  function.  Let 
tj  be  the  instant  of  opening  and  t2  be  the  instant  of  closure  in  the 
area  function.  Let  TQ  be  the  period  of  glottal  cycle  then 

t  -  t, 
OQ  =  -£ i 

T0 

The  open  quotient  can  also  be  measured  from  the  EGG  signal  and  its 
derivative  (DEGG).  Using  the  EGG  signal,  we  first  remove  the  dc  level 
in  the  EGG  record  by  subtracting  the  mean.  Once  the  dc  level  is 
removed,  we  define  the  duration  between  consecutive  negative-to-positive 
(N-P)  and  positive-to-negative  (P-N)  EGG  zero  crossings  as  an  estimate 
of  the  open  phase.  The  glottal  period  TQ  is  defined  as  the  time 
duration  between  consecutive  positive-to-negative  EGG  zero  crossings. 

In  our  measurements  we  use  the  DEGG  signal  to  measure  both  the  open 
phase  and  the  glottal  period.  We  define  the  minimum  and  maximum  in  the 
DEGG  as  the  instants  of  glottal  closure  and  opening,  respectively.  In 
this  method,  the  open  phase  is  set  equal  to  the  duration  between 
consecutive  minimum  and  maximum  in  the  DEGG.  The  glottal  period  is 
defined  as  the  time  between  two  consecutive  minima  in  the  DEGG  signal. 
This  method  is  more  efficient  than  the  previous  one  and  it  does  not 
require  the  removal  of  the  dc  level  in  the  EGG  signal. 
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Observations  made  from  high  speed  films  indicate  that  the  vibratory 
pattern  of  the  vocal  folds  does  not  always  result  in  complete  closure  of 
the  glottis,  and  hence  no  true  closed  phase  exists  in  the  glottal 
cycle.  In  many  of  these  instances  the  resulting  EGG  still  shows  the 
familiar  waveshape.  In  such  cases,  the  measurements  as  outlined  above 
will  be  in  error  since  the  duration  of  the  open  phase  spans  over  the 
entire  glottal  cycle.  Using  the  area  function  method,  the  open  quotient 
will  have  a  value  of  one,  whereas  using  the  DEGG  method  will  result  in  a 
value  less  than  one.  Hence,  caution  should  be  exercised  when  using  the 
differentiated  EGG  method  for  measurements  of  the  open  quotient  and 
should  only  be  used  when  a  closed  phase  is  known  to  exist. 

Measurement  results.  Tables  5.29  to  5.31  present  the  open  quotient 
measurements  at  different  frequencies  and  intensities  for  subjects  with 
a  normal  larynx.   The  tables  indicate  that  the  open  quotient  measured 
from  the  EGG  signal  agree  with  ones  measured  from  the  area  function  with 
an  average  error  across  subjects  of  about  11%  at  low  intensity 
phonations.   At  medium  intensity  phonations,  the  average  error  across 
subjects  is  about  13%,  and  for  high  intensity  phonation  the  average 
error  is  least  for  most  of  the  subjects  and  the  average  error  across 
subjects  is  at  about  10%,  i.e.,  the  error  is  consistently  smaller  for 
most  of  the  subjects  at  high  intensity  phonations  regardless  of  the 
fundamental  frequency  of  phonations.  However,  the  error  in  measurements 
of  the  OQ  from  the  EGG  as  compared  to  measurements  from  the  area 
function  are  consistently  smaller  for  medium  frequencies  regardless  of 
the  intensity  of  phonations. 
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Table  5.29  Open  quotient  measurements  for  subjects  with  a  normal 
larynx  from  the  EGG  signal  and  the  area  function 


Int.  Low 


Subj.        JMN  DMK  AKK  GPM 


Freq.   125   170   340    125   170   340    125   170   340    125   170   340 


Area    0.62  0.53   NA    0.66  0.61  0.76  .  0.72  0.61  0.75   0.84  0.57  0.89 


EGG     0.56  0.53   NA    0.62  0.60  0.74   0.50  0.50  0.55    NA   0.59  0, 
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%Error  8.97  0.72   NA    7.28  1.67  1.96  29.70  17.30  27.20    NA   2.20  7.70 


Table  5.30  Open  quotient  measurements  for  subjects  with  a  normal 
larynx  from  the  EGG  signal  and  the  area  function 


Int. 


Medium 


Subj .        JMN  UMK 


AKK  GPM 


Freq.   125   170   340    125   170   340    125 


170   340    125   170   340 


Area    0.47  0.69   NA   0.53  0.67  0.79   0.77  0.53   NA    NA    NA   0.82 


EGG    0.53  0.54   NA    0.64  0.62  0.72   0.56  0.55   NA     NA    NA   0.79 


XError  13.80  21.80   NA   19.70  6.70  7.80  27.80  3.20   NA     NA    NA   2.90 


Mi:  Results  are  rounded  off  to  two  significant  decimal  points 
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Table  5.31  Open  quotient  measurements  for  subjects  with  a  normal 
larynx  from  the  EGG  signal  and  the  area  function 


Int. 

High 

Subj . 

JMN 

DMK 

AKK 

GPM 

Freq. 

125 

170 

340 

125 

170 

340 

125 

170 

340 

125 

170 

340 

Area 

0.64 

0.54 

HA 

0.56 

0.55 

HA 

0.61 

NA 

0.88 

0.61 

0.83 

0.71 

EGG 

0.58 

0.59 

NA 

0.60 

0.62 

NA 

0.58 

NA 

0.68 

0.67 

0.79 

0.72 

XError  8.50  7.90   NA    8.10  11.60   NA   11.10   NA  23.50   9.40  4.40  1.50 


Table  5.32  Open  quotient  measurements  for  subjects  with  a 
laryngeal  pathology  from  the  EGG  signal  and  the 
area  function 


Subj.        DJB  LMB  HXR 


GTS 


Jask      A       B        C       B        A       B        A       B 
(Freq.)  (284)    (438)     (172)      NA       NA    (151)     (169)      NA 


Area    0.61     0.91     0.77       NA       NA    0.79     0.56 


HA 


EGG     0-39     0.41     0.44       NA       NA    0.74     0.49 


HA 


XError  36.70    55.58     42.41       NA       NA    6.37     11.75 


NA 
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It  is  apparent  that  the  error  in  measuring  the  OQ  from  the  EGG  is 
both  subject  and  task  dependent.  For  example,  at  a  low  intensity 
phonation,  the  data  from  subject  JMN  has  8.97%  error  at  FQ  =  125  Hz,  and 
an  error  of  0.72%  at  FQ  =  170  Hz,  while  the  data  from  the  same  subject 
has  an  error  of  13.8%  at  FQ  =  125  Hz,  and  an  error  of  21.6%  at  FQ  =  170 
Hz.  Also  for  another  subject,  say  GPM,  the  error  relationship  for  the 
OQ  at  low  intensity  for  different  frequencies  is  different  from  that  at 
high  intensity  phonations. 

The  values  of  the  OQ  for  subjects  with  a  normal  larynx  measured 
from  the  area  function  at  low  and  medium  frequencies  is  consistently 
smaller  than  that  measured  at  a  high  frequency  of  phonation  regardless 
of  the  intensity  of  phonation.  Also,  the  OQ  values  measured  from  the 
EGG,  in  general,  agree  with  this  trend. 

Table  5.32  presents  the  OQ  measurements  for  subjects  with  a 
pathology.  The  error  in  measuring  the  OQ  from  the  EGG  is  higher  than 
that  in  subjects  with  a  normal  larynx,  with  an  average  error  across 
subjects  of  30%  compared  to  the  values  mentioned  earlier  for  subjects 
with  a  normal  larynx.  A  significant  portion  of  this  error  is  due  to  the 
irregularities  in  the  shape  of  the  EGG  signal  for  subjects  with 
laryngeal  pathologies. 

5.2.5  EGG  Closing  Time 

We  define  this  parameter  as  the  time  interval  spanned  by  the 
segment  of  the  EGG  signal  that  has  a  rapid  decrease  (or  fall),  as 
indicated  in  Figure  5.3. 
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As  mentioned  earlier,  the  rapid  fall  region  in  the  EGG  signal 
indicates  the  onset  of  the  closed  phase  region  of  the  glottal  cycle,  as 
shown  in  Figure  5.3.  The  area  function  during  this  closed  phase  region 
takes  on  a  constant  value  of  zero  and  as  such  does  not  reflect  any  vocal 
fold  activity  during  this  phase.  On  the  other  hand,  the  EGG  is 
continuously  changing  during  this  phase,  reflecting  the  ongoing  activity 
of  the  vocal  folds. 

Figure  5.3  shows  that  the  minimum  in  the  differentiated  EGG  signal 
coincides  with  the  beginning  of  the  closed  phase  region  in  the  area 
function  and  the  beginning  of  the  rapid  fall  region  of  the  EGG.  After 
this  point  the  EGG  continues  to  decrease  but  at  a  slower  rate  until  it 
reaches  a  minimum.  This  slower  rate  of  change  is  reflected  in  an 
increase  in  the  DEGG  from  a  minimum  to  a  value  of  zero  once  the  EGG 
reaches  its  minimum,  and  levels  off  for  a  short  period  of  time. 

Observations  of  ultra  high  speed  films  of  the  vibratory 
characteristics  of  the  folds  show  that  the  EGG  reaches  its  minimum  after 
the  folds  have  already  closed.  We  have  argued  that  the  EGG  is  directly 
related  to  the  amount  of  contact  area  of  the  vocal  folds.  Based  on  this 
argument,  the  minimum  in  the  EGG  corresponds  to  the  time  the  lateral 
contact  area  reaches  its  maximum  value.  This  point  also  corresponds  to 
the  time  the  DEGG  starts  to  level  off  after  going  through  a  minimum.  So 
the  EGG  closure  time  parameter  represents  the  time  the  folds  make 
initial  contact  to  the  time  of  maximum  area  of  contact. 

The  rationale  for  proposing  such  a  parameter  lies  in  the  fact  that 
the  closure  of  the  vocal  folds  is  the  dominant  source  of  excitation  of 
the  vocal  tract  during  voiced  sound  production.   Many  researchers  have 
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argued  that  aspects  of  the  vibration  of  the  vocal  folds  are  the  major 
factor  in  determining  voice  quality.  We  argue  that  the  EGG  closing  time 
parameter  is  directly  related  to  the  vocal  fold  closure  and  can  provide 
information  on  the  dynamics  and  nature  of  such  an  important  event. 
Further,  this  parameter  should  reflect  the  differences  between  the 
vibration  of  healthy  vocal  folds  and  ones  with  a  pathology.   If  a 
pathology  introduces  an  organic  change  to  the  structure  of  the  folds,  it 
could  manifest  itself  in  the  EGG  in  two  ways:   by  changing  the  EGG 
closing  time,  and  by  changing  the  waveshape  of  the  resulting  EGG  signal. 
It  is  important  to  note  that  this  parameter  (ECT)  is  different  from 
the  area  closing  time  (closing  phase).  During  the  closing  phase  of  the 
area  function  the  folds  are  not  in  contact  with  each  other  and  the 
closing  phase  ends  when  the  folds  come  together.  On  the  other  hand,  the 
EGG  closing  time  region  starts  when  the  folds  make  contact  and  ends  when 
the  maximum  area  of  contact  is  achieved.  An  algorithm  to  automatically 
measure  the  EGG  closing  time  from  an  EGG  record  will  be  described  next. 
Algorithm:   EGG  closing  time.   The  closing  instant  of  the  vocal 
folds  is  assumed  to  correlate  with  the  minimum  in  the  DEGG  signal.  This 
minima  usually  lies  between  two  maxima  (two  glottal  opening  periods). 
Using  a  threshold  (THDEGG),  where  THDEGG  is  computed  as: 

THDEGG  =  .l*minimum(DEGG) 
We  find  two  points  on  both  sides  of  the  minima  of  the  DEGG  that  have 
values  greater  than  or  equal  to  THDEGG.   The  time  between  these  two 
points  is  set  equal  to  the  EGG  closing  time  (EGGCLT).  The  EGG  signal  is 
checked  to  insure  that  the  value  of  the  EGG  is  decreasing  at  the 
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beginning  of  the  closing  phase  and  is  leveling  off  at  the  end  of  the 
closing  phase. 

Algorithm 

1.  Remove  the  dc  from  the  EGG  record. 

2.  Find  the  EGG  closing  instant  and  store  it  in  array  CLT.   Let  the 
number  of  closing  instant  by  NPTS. 

a)  Detect  positive-to-negative  (P-N)  and  negative-to-positive 
(N-P)  zero  crossings  in  the  EGG  signal  and  store  in  arrays  P-N 
and  N-P,  respectively. 

b)  Between  successive  N-P  zero  crossings,  i.e.,  between  NP(j)  and 
NP(j+l),  find  the  minimum  in  the  DEGG  signal,  call  it  MDFEGG. 
So  MDFEGG  =  DEGG(k). 

c)  Set  CLT(I)  =  k. 

3.  LOOP: 

For  1=1,  NPTS 

a)  Set  threshold  THDEGG  =  .1*DEGG(CLT(I)). 

b)  Find  the  beginning  of  the  closing  phase  BCLT(I) 
DO  k  =  CLT(I),  CLT(I-l),  -1 

IF  DEGG(k)  2.  THDEGG 
THEN  BCLT(I)  =  k 
GO  to  c 
END  of  DO 
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c)  Find  the  end  of  the  closing  phase  ECLT(I) 
DO  k  =  CLT(I),  CLT(I+1) 

IF  DEGG(k)  _>  THDEGG 

THEN  ECLT(I)  =  k 

GO  to  d 
END  of  DO 

d)  END  of  LOOP 


Measurement  results.  Tables  5.33  to  5.35  present  measurements  of 
the  EGG  closure  time  for  subjects  with  a  normal  larynx  at  different 
frequencies  and  intensities  of  phonations.  The  measurements  show  that, 
in  general,  the  EGG  closure  time  decreases  with  increasing  intensity 
regardless  of  the  frequency  of  phonation.  Also,  with  the  exception  of 
one  subject,  the  EGG  closure  time,  in  general,  decreased  with  increasing 
fundamental  frequency  of  phonation. 

The  decrease  in  closure  time  of  the  EGG  indicates  a  faster  rate  of 
closure  for  the  folds.  The  correlation  between  higher  intensity  and 
shorter  closure  time  is  significant  and  should  shed  some  light  on  the 
complex  interacton  between  different  factors  during  vocal  fold 
vibrations.  This  is  obviously  of  interest  to  singers,  where  efficiency 
of  the  voice  is  of  prime  interest. 

In  most  subjects,  the  change  in  EGG  closure  time  was  more  dramatic 
when  the  frequency  of  phonation  changed  from  medium  to  high  frequency, 
or  when  going  from  medium  to  high  intensity  of  phonations.  For  example, 
subject  DMK  has  an  EGG  closing  time  of  1.18  msec  at  FQ  =  170  Hz  and  0.46 
msec  at  FQ  =  340  Hz  both  at  low  intensity  phonation,  compared  to  1.0 
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Table  5.33     EGG  closure  time  (in  msecs)  for  subjects  with  a 
?P™alJarynx  from  short  term  (ST)  and  long  term 
(LT;   EGG  signal   record 


Int. 


Low 


Subj.        JMN  DMK  AKK 


GPM 


Freq.   125   170   340    125   170   340    125   170   340 


125   170   340 


ST     0.60  0.70  0.59    NA   1.18  0.46   0.87  0.88  0.58   3. 


38  1.16  1.20 


LT     0.64  0.72  1.51    NA   1.28  0.51   0.92  0.82  0.62   3.34 


1.14  1.24 


XError  6.10  2.65  5.65    NA   7.49  10.49   5.67  6.31  6.49   1.11  1.84  3, 
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Table  5.34  EGG  closure  time  (in  msecs)  for  subjects  with  a 
normal  larynx  from  sho 
(LT)  EGG  signal  record 


"°™a!Jar^,x.frorn  short  term  (ST)  and  long  term 


Int«  Medium 


Subj.        JMN  DMK  AKK 


GPM 


Freq.   125   170   340    125   170   340    125   170   340    125 


170   340 


ST 


0.60  0.66  1.35    NA   1.10  0.40   0.61  0.80   0.07  2.46  2.40 


0.70 


LT 


0.63  0.59  1.32    NA   0.98  0.39   0.62  0.76   0.67  2.44  2.59  0.67 


XError  4.47  12.24  2.26    NA   1.94  3.26   0.17  4.67  5.11   0.65  7.19  3.74 
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Table  5.34  EGG  closure  time  (in  msecs)  for  subjects  with,  a 
normal  larynx  from  short  term  (ST)  and  long  term 
(LT)  EGG  signal  record 


Int. 

High 

Subj. 

JMN 

DMK 

AKK 

GPM 

Freq. 

125 

170 

340 

125 

170 

340 

125 

170 

340 

125 

170 

340 

ST 

0.47 

0.38 

1.14 

1.10 

1.10 

1.30 

0.61 

NA 

0.59 

NA 

1.18 

0.64 

LT     0.49  0.44  1.18   1.10  1.03  0.36   0.61  0.65  0.58   2.27  1.18  0.67 


JError  4.40  12.90  4.00   0.00  6.80  16.70   0.70   NA   1.40    NA   0.60  4.50 


Table  5.36  Average  EGG  closure  time  for  subjects  with  a  laryngeal 
pathology  from  short  term  (ST)  and  long  term  (LT)  EGG 


record 


Subj.        DJB  LMB  MXR  GTS 


Task 


ST      0.50     0.55      0.72      NA       NA     0.83 


NA      NA 


LT      °-45     0.55      0.71      NA       NA     0.89       NA      NA 


XError  11.80     0.34      0.84      NA       NA     7.20       NA      NA 
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msec  at  170  Hz  and  340  Hz  at  medium  intensity  phonations  and  0.61  and 
0.58  at  170Hz  and  340  Hz  at  hiyh  intensity  phonation.  Table  5.36 
presents  the  results  for  subjects  with  a  pathology.  Again  the  same 
conclusions  are  drawn  between  the  EGG  closure  time  and  frequency  of 
phonation. 

5.3  Short  Time  and  Long  Time  EGG  Records 

Tables  5.37  to  5.40  present  measurements  of  the  fundamental 
frequency  from  long  time  and  short  time  EGG  signals  records.  The  short 
time  (ST)  is  for  records  of  30  msec  duration  or  300  samples.  The  long 
time  (LT)  records  are  for  records  of  1.024  sec  or  10240  samples  long. 
The  last  row  presents  the  error  between  the  long  and  short  time  records, 
taking  the  results  from  the  long  time  record  as  the  reference. 

The  measurements  of  the  fundamental  frequency  reveal  that  a  ^/ery 
small  error  occurs  when  using  short  time  records  instead  of  long  time 
records.  The  error  ranges  between  a  maximum  of  7%  to  a  minimum  of  0.1% 
for  subjects  with  a  normal  larynx  and  a  maximum  of  3.3%  and  a  minimum  of 
.1%  for  subjects  with  a  pathology. 

Tables  5.33  to  5.35  have  two  entries  for  the  EGG  closing  time:  ST 
and  LT.  Using  the  value  of  the  EGG  closure  time  measured  from  the  long 
data  records  (LT)  as  the  reference  value,  we  calculated  the  error  or 
discrepancy  in  the  results  when  the  EGG  closure  time  is  measured  from 
short  data  records.  The  calculated  percentage  error  is  listed  in  the 
last  row.  For  subjects  with  a  normal  larynx  the  percentage  error  ranged 
from  a  maximum  of  16.7%  to  a  minimum  of  0.0%  with  an  average  across 
subjects  of  5.2%  at  low  intensities  for  different  frequencies,  4.2%  at 
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Table  5.37  Fundamental  frequency  measurements  for  subjects 

with  a  normal  larynx  from  short  time  and  long  time 
EGG  records  for  low  intensity  phonation 


Int.  Low 


Subj.         JHM  DMK  AKK  GPM 


Freq.    125   170   340  125    170   340   125   170   340   125   170   340 


ST      123.4  163.9  320.8  117.6  163.9  333.3  186.6  156.2  204.1  142.9  159.4  320.3 


LT      123.5  165.6  329.6  HA   163.6  336.3  187.0  156.6  203.2  140.2  162.3  328.1 


XError    0.1   1.3   2.7  NA    0.2   0.9   0.2   0.2   0.5   1.9   1.8   2.4 


Table  5.38  Fundamental  frequency  measurements  for  subjects 

with  a  normal  larynx  from  short  time  and  long  time 
EGG  records  for  medium  intensity  phonation 


Int.  Medium 


Subj.         JMN  DMK  AKK  GPM 


Freq.    125   170   340  125    170   340   125   170   340   125   170   340 


ST      125.0  163.0  332.9  116.7  161.3  334.8  200.0  214.3  203.1  176.4  169.5  332.1 


LT      NA   164.8  336.3  NA   162.9  342.2  209.7  220.7  208.8  173.3  170.8  339. 


XError   NA     1.1   1.0  NA     1.0   2.2   4.3   2.9   2.7   1.8   0.7 


2.3 


137 


Table  5.39  Fundamental  frequency  measurements  for  subjects 

with  a  normal  larynx  from  short  time  and  long  time 
EGG  records  for  high  intensity  phonation 


Int.  High 


Subj.        JMN  DMK  AKK  GPM 


Freq.    125   170   340  125    170   340   125   170   340   125   170   340 


ST      121.2  170.4  347.6  115.8  161.9  333.3  190.1  NA   333.3  143.5  189.6  321.1 


LT      125.0  173.1  344.4  116.9  163.7  346.1  192.9  179.0  335.7  140.7  181.8  345.4 


XError    3.0   1.6   4.0   0.9   1.1   3.7   1.4  NA     0.7   2.0   4.3   7.0 


Table  5.40  Fundamental  frequency  measurements  from  short  time 
and  long  time  EGG  records  for  subjects  with  a 
laryngeal  pathology 


Subj.        DJB  LMB  HXR  GTS 


Task  AB  CB  AB  AB 

(Freq.)    (284)  (438)  (172)  NA  NA         (151)  (169)  NA 


ST      284.7    435.3     171.7      NA       NA    153.1     170.3      NA 


LT      282.2    435.7     174.1      NA       NA    148.2      NA       NA 


XError    0.9     0.1      1.4      NA       NA     3.3      NA       NA 
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medium  intensities  for  different  frequencies,  and  5.2%  for  high 
intensity  phonation  at  different  frequencies.  The  overall  average  error 
is  on  the  order  of  5%,  regardless  of  the  frequency  or  intensity  of 
phonation.  It  is  clear  that,  in  general,  error  results  when  using  a 
short  EGG  data  record  for  measuring  the  EGG  closure  time  for  voiced 
sounds. 

This  result  has  important  practical  implications.  It  indicates 
that  there  is  no  tangible  benefit  in  collecting  long  EGG  data  records 
from  subjects  for  measuring  the  EGG  closure  time.  This  translates  into 
a  significant  reduction  in  processing  time  and  computer  memory  storage 
required  for  storing  a  number  of  EGG  records  over  a  long  period  of  time 
for  each  subject. 

However,  in  the  case  of  subjects  with  a  laryngeal  patholoyy,  the 
folds  can  exhibit  a  sudden  asynchrony  in  vibration.  An  anamoly  can 
exist  for  a  short  time  and  is  evident  in  both  the  speech  and  EGG 
signal.  An  anamoly  can  be  missed  if  we  use  a  data  record  that  is  too 
short.  Also,  averaging  over  a  long  data  record  may  "wash  out"  the 
effects  of  such  anamolies. 

5.4  Conclusions 


Two  principal  objectives  were  sought  in  this  chapter.  The  first 
was  to  provide  measurements  of  the  vocal  fold  vibratory  parameters.  The 
second  was  to  investigate  the  possibility  of  replacing  the  area  function 
by  the  EGG  signal.  Measurements  of  the  vocal  fold  vibratory  parameters 
were  presented,  accompanied  with  a  discussion  of  the  algorithms 
developed  to  carry  out  these  measurements.  These  measurements  represent 
the  first  such  computations  in  the  literature. 
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We  have  established  that  the  EGG  signal  can  be  used  to  measure  most 
of  the  vocal  fold  vibratory  parameters  with  acceptable  levels  of 
error.  So  far  we  have  not  presented  measurements  for  the  SQ  from  the 
EGG  signal.  However,  this  will  be  a  subject  for  further  research. 

The  EGG  signal  is  clearly  related  to  the  amount  of  lateral  contact 
of  the  folds.  This  fact  manifests  itself  in  many  ways.  For  instance, 
our  measurements  show  that  the  errors  in  measuring  the  instants  of 
glottal  closure  are  much  less  than  when  measuring  the  instants  of 
opening.  Physiologically,  the  closure  of  the  vocal  folds  is  a  more 
abrupt  phenomenon  than  the  opening  process  and,  as  such,  the  rate  of 
change  in  the  lateral  area  of  contact  is  much  greater.  Since  the  EGG  is 
sensitive  to  the  change  in  the  area  of  contact,  it  is  no  surprise  that 
the  EGG  marks  such  an  event  more  precisely  than  the  event  of  opening. 

We  found  that  the  opening  phase  measured  from  the  area  function  is 
inversely  proportional  to  the  frequency  of  vocal  fold  vibration.  At 
medium  intensities,  the  average  opening  time  stays  almost  constant  when 
the  fundamental  frequency  of  phonation  is  increased  from  low  to  medium 
range,  but  changes  substantially  when  going  from  a  medium  to  a  high 
frequency  of  phonation.  At  low  intensities  the  opposite  seems  to  be 
true,  and  at  high  intensity  phonations  the  change  is  uniform  when  going 
from  low  to  medium  and  from  a  medium  to  a  high  frequency  of  phonation. 

The  closing  phase,  also  measured  from  the  area  function,  varied 
inversely  with  the  frequency  and  intensity  of  vocal  fold  vibration. 
Whereas,  the  speed  quotient  and  the  speed  index  vary  directly  with  the 
intensity  of  phonation  and  inversely  with  the  frequency  of  vibration. 
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The  results  show  that  the  EGG  measures  the  open  quotient  more 
accurately  for  voiced  sounds  at  high  intensity  than  at  low  intensities, 
regardless  of  the  frequency  of  phonation.  However,  the  errors  in 
measurements  are  least  at  medium  frequencies,  so  we  recommend 
measurements  of  the  open  quotient  at  medium  frequencies  with  a  high 
intensity  of  phonation. 

The  EGG  closure  time  is  an  important  vibratory  parameter  that  seems 
to  indicate  vocal  fold  efficiency.  The  closure  time  is  inversely 
proportional  to  the  frequency  and  intensity  of  phonation.  The  closure 
time  changes  dramatically  where  the  frequency  of  phonation  is  changed 
from  medium  to  high  frequency  or  when  going  from  medium  to  high 
intensity  of  phonation. 

Finally,  we  demonstrated  that  short  data  records,  such  as  300 
sample  points,  are  sufficient  for  accurately  measuring  the  closure  time 
and  fundamental  frequency  of  phonation.  This  is  a  very  important  result 
with  important  practical  implications.  However,  this  result  appears  to 
hold  only  for  subjects  with  a  normal  larynx  and  not  for  patients  with  a 
laryngeal  pathology. 


CHAPTER  6 
FEATURE  EXTRACTION  FOR  LARYNGEAL  PATHOLOGY  DETECTION 


6.1  Introduction 


We  mentioned  earlier  that  laryngologists,  phoniatricians,  and 
speech  pathologists  rely  mostly  on  two  basic  techniques,  listening  to 
the  voice  and  viewing  the  larynx  with  the  aid  of  a  mirror  or  a 
laryngoscope. 

However,  the  idea  of  using  acoustic  analysis  for  laryngeal 
pathology  detection  is  very  appealing.  Not  only  is  it  a  noninvasive 
technique,  but  it  also  lends  itself  to  screening  a  large  population  for 
early  signs  of  voice  pathology.  This  technique  requires  minimum 
cooperation  of  the  subjects  and  analysis  can  be  made  from  tape 
recordings.  Due  to  these  factors  researchers  are  increasing  their 
efforts  in  this  direction. 

The  question  is:  Which  factors  of  the  acoustic  signal  indicate  the 
presence  of  a  pathology?  As  of  now  this  question  has  not  been  answered 
satisfactorily.  Lieberman  [41,42]  investigated  the  effects  of 
pathologies  on  the  pitch  period.  He  noted  that  pathological  voices  tend 
to  have  greater  variation  or  jitter  on  their  pitch  period.  Koike, 
working  along  the  same  lines,  noted  that  the  pitch  periods  vary  smoothly 
in  normal  voices  whereas  they  abruptly  change  in  the  pathologic  voices 
[43,44]. 
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Recently,  more  emphasis  has  been  directed  toward  the  frequency  or 
spectral  domain  characteristics  of  the  acoustic  signal.  Yanagihara 
studied  the  hoarse  voice  and  found  that  hoarseness  can  be  the  result  of 
four  different  conditions  depending  on  the  frequency  band  dominated  by 
the  noise  [45,46].  Frokjaer-Jensen,  Prytz,  and  Kitzing  used  the  long- 
time-average spectra  of  the  speech  signal  to  study  different  parameters 
as  FO,  Fl,  etc.  [47,48]. 

In  all  of  these  studies,  the  degree  of  accuracy  of  some  of  the 
measured  parameters  is  limited  because  these  parameters  are  measured 
from  the  speech  signal  itself.  This  usually  degrades  the  performance  of 
the  algorithms  used,  but  results  in  somewhat  accurate  measurements  in 
the  average  sense.  In  this  case  the  erratic  and  transient  behavior  that 
characterizes  some  pathologies  will  not  be  detected  due  to  the  inherent 
averaging  performed  by  these  methods. 

In  this  chapter  we  will  discuss  different  acoustic  parameters  for 
subjects  with  normal  and  abnormal  larynx.  Using  our  data  base,  we  have 
developed  algorithms  for  measurements  of  different  parameters  using  the 
EGG  signal.  The  algorithms  implemented  to  provide  these  measurements 
are  explained  in  Appendix  A.  We  also  suggest  a  new  time  domain  method 
based  on  the  probability  density  function  of  the  EGG  signal. 

6.2  Fundamental  Frequency 

Lieberman  and  Koike  suggested  three  parameters  related  to  the  pitch 
period  as  measures  for  detecting  the  presence  of  a  pathology.  The  pitch 
perturbation  and  the  pitch  perturbation  factor  were  used  by  Lieberman, 
whereas  the  relative  average  perturbation  was  used  by  Koike.   Those 
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parameters  were  used  to  indicate  the  smoothness  of  the  pitch  period 
contour  over  a  voiced  speech  segment.  The  larger  the  values  of  these 
parameters  the  higher  the  probability  of  laryngeal  pathology. 

In  the  past,  measurements  of  the  pitch  period  were  made  directly 
from  the  speech  signal  or  the  output  of  a  contact  microphone.  We 
contend  that  using  the  EGG  signal  will  improve  the  accuracy  of  these 
measurements.  In  the  following  sections  we  will  discuss  the  parameters 
measured  from  the  EGG  signal  in  our  data  base. 

6.2.1  Pitch  Perturbation 

Let  the  pitch  period  be  P(i)  and  P(i+1),  during  the  ith  and  ith+l 
period,  respectively.  The  pitch  perturbation,  therefore,  is  defined  as 

PP  =  AP  =  |P(i)  -  P(i-l)|  (6.1) 

6.2.2  Pitch  Perturbation  Factor 

The  pitch  perturbation  factor  is  defined  as 
number  of  PP  >   .5  msec 


PPf 


(6.2) 


total  number  of  pitch  periods 
6.2.3  Relative  Average  Perturbation  (RAP) 

This  parameter  is  defined  as 


1  N-l   P(i-l)  +  P(i)  +  P(i+1) 

RAP  =  —  z     | P(i)  |        (6.3) 

N-2  1=2  3 
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6.2.4  Average  Percent  Jitter 

Let  lis  define  jitter  as  the  variation  of  a  pitch  period  i  from  the 
average  pitch  period,  then  the  average  percent  jitter  is  defined  as 

sum  of  jitter  of  every  cycle  in  the  record  x  100% 

%   jitter  = . (6#4\ 

total  number  of  pitch  cycles 

=  average  jitter  x  100% 
6.2.5  Measurement  Results 

In  general,  our  measurements  of  the  above  parameters  did  not 
present  a  clear  difference  between  subjects  with  laryngeal  pathologies 
and  subjects  with  normal  larynges.   The  measurements  show  that  if  we 
exclude  Tasks  at  the  high  pitch  frequency,  the  jitter  is  generally  less 
for  subjects  with  normal  larynges.   This  is  an  important  observation, 
since  it  indicates  that  jitter  measurements  should  be  performed  on 
phonations  at  low  or  medium  frequencies  in  both  types  of  population  when 
using  measurements  from  the  EGG  signal.   This  can  be  attributed  to  two 
possible  causes.   First,  the  EGG  signal  is  related  to  the  amount  of 
lateral  contact  between  the  folds.   The  contact  between  the  folds 
decreases  as  the  pitch  frequency  of  the  phonation  increases  resulting  in 
an  EGG  waveshape  that  is  more  sinusoidal.  In  this  case,  the  instants  of 
opening  and  closure  are  less  defined  resulting  in  deterioration  in  the 
performance  of  the  pitch  detection  algorithm.   This  can  be  overcome  by 
using  a  zero  crossing  based  algorithm  instead.   Second,  it  seems  that 
the  mechanical  system  of  the  folds  is  more  stable  and  the  folds  are  more 
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in  synchrony  at  low  frequency  phonations  than  at  higher  ones.  So  to 
prevent  this  inherent  jitter  from  masking  the  pathology  induced  jitter, 
one  should  use  low  or  medium  range  of  phonation  frequencies. 

Figures  6.1  and  6.2  show  graphs  of  the  instantaneous  fundamental 
frequency  variation  from  the  average  for  a  subject  with  a  normal  larynx 
and  a  subject  with  a  laryngeal  pathology,  respectively. 

We  also  measured  the  FO  variance  for  subjects  with  normal  larynges 
and  subjects  with  a  laryngeal  pathology.  It  was  found  that  if  high 
frequency  phonations  were  excluded,  the  variance  for  subjects  with 
normal  larynges  is  in  general  less  than  that  for  subjects  with  laryngeal 
pathologies. 

6.3  Probability  Mass  Function  Method 

So  far  most  laryngeal  pathology  detection  schemes  used  the  speech 
signal  as  the  sole  source  of  information  in  acoustic  analysis  methods. 
This  is  only  natural,  since  the  presence  of  a  laryngeal  pathology  can 
often  be  heard  when  the  patient  speaks.  It  is  generally  believed  that 
the  effects  of  a  pathology  on  the  vocal  folds  vibrations  are  the  cause 
of  the  abnormal  change  in  voice  quality.  After  all,  the  vibration  of 
the  folds  is  the  source  of  excitation  for  voiced  speech. 

This  being  the  case,  many  researchers  [79-83]  opted  to  find  the 
excitation  signal  itself  (volume-velocity  signal)  to  see  whether  the 
presence  of  a  pathology  altered  the  form  of  this  signal  as  compared  to  a 
"normal"  one.  Different  methods  were  devised  to  extract  this  signal 
from  the  speech  signal,  however,  the  underlying  principle  is  the  same. 
The  speech  signal  is  inverse  filtered  to  produce  the  volume  velocity  (V- 
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V)  signal.  This  is  carried  out  by  removing  or  cancelling  out  the 
effects  of  the  supraglottal  structure  (vocal  tract  filter)  from  the 
transduced  speech  signal. 

The  major  drawback  of  these  methods  is  the  difficulty  associated 
with  carrying  out  the  inverse  filtering  operation.  Add  to  that  the 
uncertainty  in  the  performance  of  these  methods,  i.e.,  there  is  no 
objective  way  for  deciding  whether  the  method  performed  correctly. 
These  drawbacks  have  excluded  inverse  filtering  from  the  clinical 
setting.  However,  recently  another  method  was  proposed  [84]  that  uses 
both  the  EGG  signal  and  the  speech  to  alleviate  some  of  these  problems. 

In  this  section  we  approach  the  problem  differently.  It  is  obvious 
that  the  information  about  the  source  (vibrator)  in  the  speech 
production  system  is  most  important.  We  claimed  earlier  that  the  EGG  is 
directly  related  to  the  vibration  of  the  folds  (source),  hence,  this 
signal  should  contain  some  of  the  sought  information.  It  turns  out  that 
this  claim  is  true.  Figure  6.3  presents  the  EGG  waveform  measured  from 
a  patient  with  a  nodule  located  near  the  middle  of  the  membranous  part 
of  one  of  the  folds.  It  is  clear  that  this  EGG  waveshape  differs  from 
that  of  a  normal  EGG  as  shown  in  Figure  6.4. 

After  viewing  the  corresponding  film  repeatedly,  we  can  explain  the 
behavior  portrayed  in  Figure  6.3.  The  folds  start  to  separate  when  the 
opening  phase  begins.  The  EGG  starts  its  usual  incline.  However,  as 
the  folds  continue  in  their  separation  beyond  the  location  of  the 
nodule,  the  nodule  still  maintains  contact  with  the  other  folds.  For  a 
short  time,  this  contact  compensates  for  the  decrease  in  the  area  of 
contact  between  the  folds.  The  EGG  signal  remains  constant  over  this 
interval,  and  hence  the  odd  shape  that  we  see  in  the  figure. 
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This  observation  led  to  our  proposal  of  this  new  method  for 
laryngeal  pathology  detection.  It  is  clear  from  Figure  6.3  that  the 
time  domain  characteristics  of  the  EGG  can  indicate  the  presence  of  a 
pathology.  This  leads  to  a  multitude  of  questions,  such  as:  What  does 
the  EGG  signal  look  like  for  other  pathologies?  Is  it  possible  to 
associate  a  pathology  with  a  certain  (unique)  waveform?  Is  it  possible 
to  ascertain  the  extent  of  a  specific  pathology,  and  hence  quantify 
these  shapes? 

Given  the  fact  that  the  electroglottograph  is  an  inexpensive,  easy 
to  use  device,  these  questions  become  very  important  due  to  their 
implications;  i.e.,  similar  to  the  EKG,  every  clinic  and  speech 
pathologist  will  want  to  obtain  such  a  device.  Overdependency  and 
excessive  trust  in  an  unproven  device  could  result  in  tragic 
consequences. 

To  answer  these  questions,  a  more  thorough  study  should  be 
undertaken.  This  study  should  include  a  large  population  from  many 
known  laryngeal  diseases.  However,  in  this  section  we  tackle  some 
aspects  of  the  third  question  regarding  the  quantification  of 
pathologies. 

One  way  to  measure  odd  behaviors  such  as  the  one  in  Figure  6.3  is 
through  measuring  the  probability  density  or  mass  function  of  the  EGG 
signal . 

The  EGG  signal  is  present  only  when  the  vocal  folds  are 
vibrating.  This  corresponds  to  the  voiced  or  mixed  region  in  the  speech 
signal.  Makhoul  [85]  points  out  that  the  speech  signal  can  be 
considered  a  stationary  process  in  this  region.   We  make  the  same 
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assumption  regarding  the  EGG  signal.  We  can  actually  go  a  little 
further  by  assuming  that  the  EGG  is  a  deterministic  random  process, 
since  it  is  possible  to  predict  future  values  from  past  values  for  the 
EGG  of  a  normal  subject.  However,  this  is  not  true  in  the  case  of  a 
subject  with  a  pathology.  The  presence  of  a  pathology  increases  the 
randomness  of  the  EGG  due  to  the  unpredictable  behavior  of  the  vocal 
folds.  Bendat  and  Piersol  [86]  noted  that  one  test  of  the  stationarity 
is  by  considering  the  physics  of  the  phenomenon  producing  the  data.  If 
the  basic  physical  factors  that  yenerate  the  phenomenon  are  time 
invariant,  then  stationarity  of  the  resulting  data  can  generally  be 
accepted  without  further  study. 

For  a  stationary  random  process,  the  probability  distribution 
function  is  defined  as 

P(egg)  =  Probability[EGG  <  egg]  (6.5) 

and  the  probability  density  function  (PDF)  as 

P(egg)  -  ig&D.  (6.6, 

where  in  this  case  EGG  is  considered  a  continuous  random  process  and  egg 
is  a  particular  value  of  EGG(t). 

However,  our  data  involve  the  digitized  EGG  signal.  Hence,  we  have 
to  consider  the  EGG  as  a  discrete  random  process.  An  equivalent 
statistical  measure  can  be  defined.  Instead  of  the  probability  density 
function  we  use  the  probability  mass  function  (PMF)  defined  as 
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p(egg)  =  Prob.[EGG(n)  =  egg]  (6.7) 

where  egg  is  a  particular  value  of  EGG(n)  and  EGG(n)  is  the  digitized 
EGG(t).  The  probability  distribution  function  is  now  defined  as 

P(egg)  =  E  P(egg)  (6.8) 

egg 

where  again  the  probability  mass  function  can  be  considered  as  the 
derivative  of  the  probability  distribution  function.  The  PMF  is  more 
appropriate  than  the  distribution  function  for  detecting  amplitude 
changes  in  the  process.  The  question  now  is  how  do  we  measure  this 
probability  mass  function?  This  question  is  tackled  in  the  following 
section. 

6.3.1  Computational  Aspects  of  the  PMF 

The  definitions  given  in  equations  (6.5-6.8)  assume  an  infinite 
number  of  sample  sequences  of  the  process,  in  this  case  the  EGG 
signal.  In  practice  it  is  impossible  to  satisfy  this  definition.  The 
customary  approach  is  to  assume  that  the  process  is  ergodic  (and  thus 
stationary),  so  the  time  average  of  a  single  trial  or  sample  is  equal  to 
the  statistical  average. 

This  assumption  permits  the  use  of  a  single  time  record  for 
computing  the  probability  mass  function.  The  computed  values  are 
estimates  of  the  true  ones.  Now  our  estimate  for  the  PMF  becomes 

P(e99)  ■  »(B*<"t  ■  «*"  „.„ 


154 

where  N(EGG(n)  =  egg)  is  the  number  of  data  points  that  take  on  a  value 
equal  to  egg,  and  N  is  the  record  length. 

The  data  base  used  in  this  study  was  obtained  by  digitizing  the  EGG 
and  speech  signals  as  explained  in  Chapter  4.  The  data  acquisition 
system  used  a  12-bit  A/D  converter.  In  this  case,  the  EGG  can  possibly 
take  on  4096  (-2048  to  2047)  different  amplitude  values.  In  practice 
the  signal  amplitude  is  amplified  to  take  advantage  of  the  dynamic  range 
of  the  A/D  converter.  However,  the  signal  usually  occupies  a  range  of 
values  less  than  the  full  range. 

If  the  EGG(n)  takes  on  values,  say  between  [-i,j],  where  i  and  j 
are  positive  integers  representng  the  digital  amplitude  values,  then  we 
can  compute  an  estimate  for  the  PMF  and  the  distribution  function. 

Let  m  be  an  integer  in  the  interval  [-i,j],  N  be  the  total  number 
of  data  points,  and  Nm  be  the  number  of  data  points  with  values  equal  to 
m,  then 


N 
P(m)  =/  (6.10) 


and 


P(m)  =  E  p(n)  (6.11) 

n<m 


We  implemented  equations  (10,11)  using  the  following  algorithm. 
6.3.2  Algorithm  PMF  and  PDF 

Let  the  EGG  record  be  EGG(l) .  ..EGG(N) .   Let  PMF(egg)  be  the  PMF 
array,  and  let  PD(egg)  be  the  probability  distribution  function. 
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1.  Remove  any  trends  from  the  data.  This  step  was  performed  as  part 
of  the  data  collection  and  measurement  system. 

2.  Find  the  maximum  and  minimum  of  the  EGG  amplitude,  EGGMAX  and 
EGGMIN,  respectively.  This  specifies  the  range  of  the  PMF,  and  the 
PMF  array  is  indexed  as  such. 

3.  Compute  the  PMF  as  follows: 

for  I  =  1  to  N,  Do 

PMF(EGG(i))  =  PMF(EGG(i))  +  1 

4.  END  of  Do. 

5.  Compute  the  probability  distribution  function  PD  as  follows: 

for  I  =  EGMIN  to  EGGMAX 
PD(I)  =  PD(I-l)  +  PMF(I) 

6.  END. 

Step  3  of  the  algorithm  says  that  a  location  in  array  PMF  is  incremented 
every  time  the  amplitude  of  the  EGG  is  equal  to  the  address  of  that 
array  location. 

Figure  6.5  presents  an  example  for  the  output  of  this  algorithm. 
The  output  shown  in  the  figure  is  simply  a  histogram  with  the  interval 
being  a  single  amplitude  point.  The  graph  shows  that  the  PMF  is  zero 
for  some  amplitudes  within  the  amplitude  range  spanned  by  the  EGG 
signal.  This  is  attributed  to  the  sampling  operation  we  employed  to 
digitize  the  EGG  signal.  Although  the  continuous  EGG  takes  on  the 
values  of  these  amplitudes,  the  digitized  EGG  does  not.  We  will  have  to 
sample  at  an  unnecessarily  high  sampling  rate  to  insure  that  a  digitized 
value  is  obtained  for  every  possible  amplitude  value  within  the  range 
spanned  by  the  EGG  signal.  This  is  neither  needed  nor  necessary. 
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We  can  accomplish  the  same  results  using  two  methods.  The  first 
scheme  would  be  to  increase  the  interval  range  used  to  calculate  the  PMF 
itself,  i.e.  instead  of  one  point  we  could  use,  say  20  points,  and  find 
an  average  value  for  that  interval.  The  other  scheme  is  similar:  we 
simply  smooth  the  plot  by  using  a  moving  average  window  at  every  point 
of  the  PMF.  We  prefer  this  scheme  because  it  tends  to  represent  the 
distribution  more  accurately,  i.e.,  the  values  of  adjacent  points  will 
vary  smoothly  rather  than  being  exactly  equal  at  one  extreme  (when 
within  the  same  interval),  or  have  a  discontinuity  when  at  the  boundary 
of  two  adjacent  intervals,  as  is  the  case  in  the  first  scheme. 

Figure  6.6  shows  the  result  of  smoothing  the  graph  in  Figure  6.5. 
This  graph  is  the  PMF  of  the  EGG  waveform  shown  in  Figure  6.3. 

The  EGG  signal  for  a  normal  subject  has  a  regular  shape  and 
somewhat  resembles  a  deformed  sinusoidal  wave.  This  is  the  case  as 
shown  in  Figure  6.7.  The  PMF  displayed  in  6.6  is  decidedly  different 
and  reflects  the  abnormality  evident  in  the  time  domain  EGG  signal 
displayed  in  Figure  6.3. 

The  method  for  finding  an  estimate  of  the  PMF  used  above  is  based 
on  the  relative  frequency  method  for  PMF  estimation.  It  can  be  shown 
that  this  estimate  is  a  biased  estimate.  It  also  requires  a  large 
number  of  points  to  be  accurate.  In  the  following  section  we  present 
another  method  for  estimating  the  PMF  that  requires  few  number  of 
points. 
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6.3.3  Relative  Entropy  Method 

This  method  [87-90]  involves  calculating  a  posterior  estimate  of 
the  PMF  based  on  a  prior  estimate  of  the  PMF  and  present  information 
about  the  process.  This  present  information  is  represented  by  the 
moments  of  the  process,  i.e.,  the  mean,  mean-square,  etc.  These  moments 
can  be  easily  measured  from  the  available  sampled  data,  if  the  prior 
estimate  is  set  to  be  uniform,  this  method  is  equivalent  to  the  maximum 
entropy  method. 

As  mentioned  earlier,  the  relative  frequency  method  requires  a 
large  number  of  points,  typically  on  the  order  of  thousands  to  obtain  an 
accurate  estimate  of  the  PMF.  Using  the  relative  entropy  method,  one 
needs  on  the  order  of  10  to  20  moments  or  less  to  apply  it  successfully. 

Traditionally,  using  this  method  involves  solving  for  n  distinct 
variables  with  n  non-linear  equations.  The  Newton-Raphson  algorithm  is 
usually  applied  to  accomplish  this  task.  It  involves  taking  (or 
estimating)  the  partial  derivatives  using  a  Gaussian  elimination 
subroutine  and  iteratively  solve  for  the  n  unknowns  simultaneously. 
Instead,  we  present  an  efficient  algorithm  that  solves  for  one  variable 
at  a  time.  The  process  is  repeated  until  the  results  converge  to  some 
consistent  value. 

The  principle  of  relative  entropy  is  defined  as  the  following 
relationship  beween  two  probability  functions  p(x)  and  q(x) 


H(q,p)  =  Z   q(x)  Log  ffl.  (6.12) 
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Where  H(q,p)  is  defined  as  the  relative  entropy  between  the  functions 
q(x)  and  p(x).  The  function  p(x)  represents  the  prior  estimate  of  the 
function  q(x).  H(q,p)  can  be  viewed  as  a  distance  or  dissimilarity 
measure  between  p(x)  and  q(x).  So  our  objective  would  be  to  minimize 
H(q,p)  subject  to  certain  constraints.  In  this  case,  the  constraints 
are  in  the  form  of  the  nth  moments  defined  as 


xn  =  E  xn  q(x)  (6.13) 


using  the  calculus  of  variation  we  introduct  the  Lagrange  multipliers  to 
the  contraints  and  minimize  the  following  expression: 


Z  q(x)  Logff4+  E  q(x)  X.xk  (6.14) 


taking  the  derivative  of  the  above  expression  with  respect  to  q(x)  and 
setting  it  equal  to  zero  yields 


Log  [$*}]  +  1  +  e  xkxk  -  0  (6.15) 


so 


N    k 
[-1  -   E   AkXK] 

q(x)  =  p(x)  e     k=0  (6.16) 


where  the  X^'s     are  the  Lagrange  multipliers,  and  N  is  the  total  number 
of  moments  calculated  from  the  available  data. 

The  above  equation  can  be  simplified  by  setting 
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XQ  -  -  1  -  XQ  (6.17) 


also  incorporating  the  minus  sign  in  the  Lagrange  multipliers,  equation 
(6.16)  becomes 

N    k 
E   X.X* 

k=0  K 

q(x)  =  p(x)  e  (6.18) 

Now 


E  q(x)  =  1  (6.19) 

x 


Using  equation  (6.18)  we  can  rewrite  equation  (6.19)  as 

N    k 
E  X.x 

k=0  K 

E  p(x)  e      =  1  (6.20) 

x 

equation  (6.20)  is  the  basis  for  our  new  algorithm. 

Algorithm 

Equation  (6.16)  represents  the  solution  to  the  problem.  However, 
the  Lagrange  multipliers  (X's)  are  not  known.  So  we  revise  this 
equation  to  solve  for  each  x  at  a  time,  solving  for  xn  ,  equation 
(6.16)  takes  the  form 


X0 
q(x)  =  p(x)  e  (6.21) 
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The  prior  estimate,  p(x),  can  be  set  a  a  uniform  PMF  (as  is  the 
case  in  the  maximum  entropy  method),  or  can  be  calculated  from  the  data 
using  the  relative  frequency  method.  Using  the  constraint  in  equation 
(6.19),  equation  (6.20)  becomes 


x0 
Z  p(x)  eu  =  1  (6.22) 


In  this  case  the  solution  for   xn  is  straightforward.   From  equation 
(6.22) 


XQ  =  -  Log  [e  p(x)]  (6.23) 

x 


However,  the  prior  p(x)  should  also  satisfy  the  properties  of  a  PMF 

function,  i.e.,  E  p(x)  =  1  ,  so  Xn  should  equal  to  zero, 
x 
The  next  step  in  the  algorithm  is  to  "add"  a  Lagrange  multiplier  to 

equation  (6.21),  and  use  a  different  constraint  to  solve  for  this  new 

multiplier.  Equation  (6.21)  can  be  rewritten  as 


Xn  +  X,x 
q(x)  =  p(x)  eu    l  (6.24) 


where  XQ  is  now  the  only  unknown.  Substituting  in  equation  (6.18)  we 
have 


XQ  +  x,x   _ 
I  x  p(x)  e       =  x  (6.25) 


where  the  mean  x  is  calculated  from  the  available  data. 
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Solving  for  A,  and  the  rest  of  the  Lagrange  multipliers  are  not 
as  simple  as  solving  for  xn  since  the  exponential  expression  is  now 
dependent  on  x.     Here  we  use  Newton's  method,  where 

x(n+l)  =  x(n)  -  fijfffij)  (6.26) 

However,  in  this  case  we  are  solving  only  for  one  unknown  using  one 
nonlinear  equation. 

Once  X,  is  found,  one  can  solve  for  A„  using  the  equation 

2 
Xn  +  X,x  +  X0x 

q(x)  =  p(x)  eU    1     Z  (6.27) 

where   X2  is  the  only  unknown.    x?  is  solved  for  using  the  second 
moment 

2 

2  x     p(x)  eu    l  d      =  xd  (6.28) 

x 

Following  the  previous  procedure,  the  flow  chart  for  this  algorithm  is 
presented  in  Figure  6.8.  Once  all  the  Lagrange  multipliers  are  found, 
q(x)  is  evaluated  using  equation  (6.18).  This  q(x),  the  posterior 
estimate,  can  now  be  used  as  the  prior  estimate  p(x)  in  a  subsequent 
iteration.  The  algorithm  can  be  repeated  until  the  results  converge, 
where  in  this  situation,  convergence  is  achieved  when  the  mean  squared 
error  between  consecutive  iterations  is  less  than  a  specified  small 
value. 
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RELATIVE  ENTROPY  FLOWCHART 


START  WITH 
A  PRIOR  PMF 
ESTIMATE 


SOLVE  FOR 
LAMDAO 


ADD  ANOTHER 

LAGRANGE  MULTIPLIER 

TO  THE  SOLUTION 


SOLVE  FOR  THE  NTH 

LAGRANG  MULTIPLIER 

USING  THE  NTH  MOMENT 


REPEAT  UNTIL 

ALL  LAGRANGE 

MULTIPLIERS  ARE 

USED 


UPDATE  PRIOR  ESTIMATE 
WITH  THE  POSTERIOR 
ESTIMATE. 


MSE 
LESS  THAN 
EPSILON  ? 


NO 


YES 


(END 


Figure  6.8  A  new  algorithm  for  the  relative 
entropy  method. 
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Figure  6.6  presented  the  PMF  of  a  subject  with  vocal  nodules.  This 
PMF  was  computed  using  the  relative  frequency  method  and  a  data  record 
of  2560  points.  Since  this  is  a  long  data  record  it  is  safe  to  assume 
that  the  computed  PMF  is  very  close  to  the  true  or  actual  PMF.  Using  50 
data  points  of  the  same  record,  the  relative  frequency  method  output  is 
presented  in  Figure  6.9.  It  is  clear  that  the  result  is  not  a  good 
match  to  the  actual  PMF. 

Using  the  relative  entropy  method,  Figure  6.10  shows  the  output  PMF 
using  the  same  50  data  points  used  earlier.  The  method  used  20  moments 
computed  from  these  50  points  and  a  uniform  prior  estimate  (i.e., 
maximum  entropy  method).  Also,  Figure  6.11  shows  the  output  PMF  for  the 
same  50  data  points  using  the  relative  entropy  method  with  20  moments 
and  a  nonuniform  prior  estimate.  In  this  case  the  prior  estimate  used 
was  the  output  PMF  of  the  relative  frequency  method. 

Figures  6.10  and  6.11  indicate  that  the  relative  entropy  method  is 
superior  to  the  relative  frequency  method,  especially  for  short  data 
records. 

6.4  Discussion 

This  chapter  presented  a  new  method  for  feature  extraction  for 
detecting  laryngeal  pathologies.  This  method  relies  on  the  EGG  time 
domain  representation  as  the  basis  for  computing  an  estimate  probability 
mass  function,  which  we  believe  can  be  instrumental  in  differentiating 
between  subjects  with  a  normal  larynx  and  subjects  with  a  pathological 
larynx. 
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Figure  6.9  The  PMF  estimate  of  the  PMF  in  Figure  6.6 
using  the  relative  frequency  method  and  a  50  point 
data  record. 
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Figure  6.10  The  PMF  estimate  of  the  PMF  if  Figure  6.6 
using  the  new  relative  entropy  method  with  50  data 
points  and  20  moments  and  a  uniform  prior  estimate. 


Figure  6.11  The  PMF  estimate  of  the  PMF  in  Figure  6.6 
using  the  new  relative  entropy  method  with  50  data 
points  and  20  moments  and  a  non  uniform  estimate. 
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The  graph  displayed  in  Figure  6.6  is  clearly  different  from  graphs 
of  normal  subjects.  The  next  step  in  this  development  would  be  to 
devise  a  scheme  that  computes  or  specifies  a  certain  distance  measure  or 
criterion  that  quantify  this  difference. 

In  order  for  us  to  specify  a  successful  criterion  or  measure,  we 
need  to  conduct  more  experiments  and  record  the  EGG  signal  from  a  large 
number  of  subjects  with  various  laryngeal  pathologies.  Only  then  a 
reliable  procedure  or  algorithm  can  be  devised  to  assist  in  the 
diagnostics  of  laryngeal  pathologies. 


CHAPTER  7 
RESULTS  AND  CONCLUSIONS 


7.1  Summary 


Three  major  accomplishments  were  achieved  in  this  study.  First, 
modeling  the  EGG  signal,  second,  we  presented  measurements  of  vibratory 
parameters  of  the  vocal  folds,  and  third,  we  developed  a  new  feature 
extraction  method  for  detecting  laryngeal  pathology. 

Modeling  of  the  EGG  was  discussed  in  Chapter  3.  There  we  showed 
that  we  can  accurately  model  the  EGG  signal  by  considering  the  physical 
system  that  produces  it,  namely  the  variation  in  the  contact  area 
between  the  vocal  folds.  We  accomplished  that  by  considering  the  output 
glottal  area  function  from  the  Flanagan-Ishizaka  model  as  an  initial 
step.  Then  by  incorporating  details  of  fold  vibrations  reached  at  by 
extensive  observations  of  high  speed  films  of  the  vocal  fold  vibrations, 
we  were  able  to  refine  our  model  to  the  point  where  we  can  now  simulate 
an  EGG  that  closely  resembles  the  measured  or  actual  EGG. 

Also,  drawing  on  our  data  base  of  EGGs  collected  from  subjects  with 
a  pathological  larynx  and  subjects  with  a  normal  larynx  and  their 
corresponding  films,  we  were  able  to  simulate  some  pathologies,  such  as 
those  cases  of  subjects  with  a  nodule.  One  great  advantage  this  model 
provides  is  the  ability  to  simulate  the  EGG  for  different  nodule  sizes 
and  locations  on  the  vocal  folds.  This  flexibility  and  degree  of  detail 
may  prove  valuable  in  the  diagnostics  area.  Other  phenomena  associated 
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with  the  fold  vibrations  were  also  modeled.  The  presence  of  excessive 
mucus  was  simulated.  The  simulation  results  pointed  out  that  even 
subtle  events  such  as  the  presence  of  mucus  can  be  simulated  accurately 
using  this  model.  The  condition  of  vocal  fry  was  also  simulated.  The 
simulation  provided  further  insight  in  the  mechanism  of  vocal  fry.  This 
is  an  interesting  observation  since  in  this  case  instead  of  providing 
the  conditions  (which  we  do  not  exactly  know)  to  produce  an  EGG  for  a 
vocal  fry,  we  actually  deduced  the  conditions  after  we  had  obtained  the 
simulated  waveform.  In  this  sense  the  model  should  prove  very  useful  as 
part  of  an  articulatory  voice  synthesis  and  transmission  system  by 
solving  for  the  laryngeal  configuration  using  the  actual  EGG  signal. 
Also,  the  model  provides  a  powerful  educational  and  diagnostics  tool. 

The  high  degree  of  detail  in  our  model  should  allow  us  to  simulate 
other  pathologies  and  thus  aid  in  creating  an  atlas  or  dictionary  of 
possible  EGG  waveshapes  corresponding  to  different  laryngeal 
pathologies. 

Our  unique  data  base  of  subjects  with  a  normal  larynx  and  subjects 
with  a  laryngeal  pathology  enabled  us  to  study  and  measure  different 
vibratory  parameters.  We  were  able  to  calculate  specific  values  or 
range  of  values  for  each  of  these  parameters.  Also,  measured  parameters 
from  subjects  with  a  normal  larynx  were  compared  with  those  measured 
from  subjects  with  a  pathologic  larynx.  This  comparison  provides  a  road 
map  to  follow  or  improve  upon  in  the  future.  This  study  cannot  claim 
that  the  measured  values  or  range  of  values  for  each  of  these  parameters 
is  absolute,  but  it  does  provide  figures  with  which  future  studies  can 
compare.  However,  this  study  points  out  certain  important  trends  and 
differences  between  the  pathologic  and  normal  larynx. 
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A  new  vibration  parameter  was  proposed.  This  parameter,  called  EGG 
closing  time  or  "ECT",  is  measured  directly  from  the  EGG  signal.  ECT  is 
related  to  the  duration  between  the  time  the  folds  initiate  contact  and 
the  time  of  maximum  contact.  This  interval  is  of  great  interest  since 
it  is  associated  with  the  glottal  closure.  So  ECT  and  the  EGG  in 
general  can  be  utilized  to  study  vocal  fold  efficiency,  which  is  of 
interest  to  singers  and  other  voice  specialists,  or  to  study  other 
events  occurring  during  glottal  closure  and  the  closed  phase. 

An  important  result  related  to  the  measurement  issue  is  our  finding 
that  short  time  EGG  records  can  provide  accurate  estimates  of  the 
vibration  parameters.  However,  we  do  caution  against  such  estimates  in 
the  presence  of  a  pathology. 

Finally,  we  presented  our  method  for  differentiating  between  a 
normal  larynx  and  a  pathologic  one  based  on  the  time  domain 
representation  of  the  EGG  signal.  We  found  that  the  probability  mass 
function  (PMF)  measured  from  the  EGG  of  subjects  with  normal  larynges 
can  be  very  different  from  that  measured  from  the  EGG  of  subjects  with 
pathologic  larynges.  Hence,  it  is  feasible  to  use  this  function  as  a 
discriminatory  test  as  part  of  a  preliminary  screening  procedure.  If 
the  initial  result  indicates  the  subject  has  an  abnormal  PMF,  more 
detailed  tests  can  be  administered  to  pinpoint  any  possible  ailment. 

7.2  Future  Research 

Throughout  this  study  many  problems  were  identified  that  we  were 
not  able  to  pursue.  In  Chapter  5  we  pointed  out  that  our  model  has  some 
inherent  imperfections,  such  as  not  accounting  for  the  deformation  of 
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the  folds  when  colliding  with  each  other.  The  collision  of  the  folds 
may  increase  the  lateral  contact  area  and  thus  will  affect  the  shape  of 
the  EGG.  Different  vibrational  modes  result  in  different  types  of 
contact  or  collisions  between  the  folds.  Based  on  our  conjecture  that 
the  EGG  is  related  to  the  contact  area,  one  would  expect  different  EGG 
waveshapes  for  different  collisions,  so  we  need  to  investigate  this 
effect  more  thoroughly. 

Our  simulations  have  been  mainly  for  a  male  voice  in  the  modal 
register.  Simulations  of  other  registers  and  types  of  voices  such  as  a 
female  or  a  child's  voice  should  be  investigated.  The  results  of  such 
simulations  should  help  to  further  refine  our  model. 

One  important  aspect  of  these  studies  and  simulations  is  solving 
the  inverse  problem,  i.e.,  at  this  point  we  have  simulated  the  EGG 
waveform  based  on  the  area  function  derived  from  the  simulated  folds 
position  using  the  Ishizaka-Flanagan  model.  It  is  possible  to  enhance 
the  model's  capabilities  to  solve  for  the  folds  position  or  laryngeal 
configuration  from  the  measured  or  actual  EGG  signal.  The  ability  to 
perform  such  analysis  could  prove  valuable  to  an  articulatory  voice 
synthesis  and  transmission  system. 

The  development  of  an  objective  screening  test  for  laryngeal 
pathology  detection  is  an  important  goal.  The  vocal  folds  play  a  major 
part  in  the  human  speech  production  system;  thus  any  abnormality 
affecting  their  functioning  will  affect  this  system  and  its  output,  the 
speech  signal.  The  first  step  is  to  measure  the  parameters  associated 
with  the  folds'  vibrations.  In  Chapter  5  we  used  the  EGG  to  measure 
some  of  these  parameters  as  an  alternative  method  to  the  present  area 
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based  procedure.  We  were  not  able  to  measure  the  opening  or  closing 
phase  from  the  EGG  signal  since  both  regions  occur  when  the  folds  are 
not  in  contact.  However,  observations  of  the  synchronized  EGG  and  area 
function  point  to  certain  correlation  between  the  maximum  amplitude  in 
the  EGG  and  the  maximum  amplitude  in  the  area  function.  It  appears  that 
it  is  possible  to  develop  an  algorithm  that  can  measure  these  parameters 
from  the  EGG  signal  alone.  The  ability  to  measure  all  the  vibration 
parameters  from  the  EGG  signal  will  eliminate  the  arduous  task  of 
digitizing  high  speed  laryngeal  films  to  obtain  the  area  function 
currently  needed  to  measure  these  parameters.  We  need  to  investigate 
this  possibility  further. 

Our  suggested  method  for  using  the  PMF  function  in  laryngeal 
pathology  detection  should  be  investigated  further.  The  method  can  be 
implemented  easily  as  part  of  an  on-line  feature  extraction  system  for 
laryngeal  pathology  detection. 


APPENDIX 
MEASUREMENT  ALGORITHMS 

In  this  appendix  we  provide  details  of  the  algorithms  used  in 
measuring  different  vibratory  parameters. 

We  first  discuss  the  algorithm  for  measuring  the  opening  and 
closing  phase  followed  by  the  algorithm  for  measuring  the  speed  quotient 
and  speed  index.  Finally,  we  discuss  the  algorithm  for  measuring  pitch 
jitter,  pitch  perturbation,  pitch  perturbation  factor,  and  the  relative 
average  perturbation. 

A.l  Opening  and  Closing  Phase 

This  algorithm  is  applied  to  the  area  function  only.  However,  this 
algorithm  can  be  applied  to  the  EGG  signal  once  a  correlation  is 
established  between  the  maximum  in  the  area  function  and  a  specific 
point  in  the  EGG  signal.  At  present,  we  are  in  the  process  of  devising 
such  an  algorithm. 

Algorithm 

Let  the  area  function  record  be  Area(l). ...Area(N). 

1.  Locate  the  opening  and  closing  instants,  ITOPEN  and  ITCLOS, 
respectively,  throughout  the  record. 

2.  Let  M  be  the  number  of  consecutive  pairs  of  opening  and  closing 
instants.  Then 
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for  I  =  1  to  M,  DO 

a.  Locate  ITMAX(I),  the  instant  of  maximum  area  value  located 
between  two  consecutive  opening  and  closing  instants 
(ITOPEN(I),  ITCLOS(I)). 

b.  Calculate  the  opening  and  closing  phase  time  as 
opening  phase  time  =  OPT(I)  =  ITMAX(I)  -  ITOPEN(I) 
closing  phase  time  =  CLOSTIME(I)  =  ITCLOS(I)  -  ITMAX(I). 


3.   END. 


A. 2  Speed  Quotient  and  Speed  Index 
The  speed  quotient  is  defined  as: 


^P  _  duration  of  opening  phase  . 

^  *  duration  of  closing  phase 


and  the  speed  index  is  defined  as; 


SI   SQ-  1 

51   SQ  +  1  M*^ 


The  algorithm  for  measuring  the  opening  and  closing  phase  is  needed 
to  provide  the  input  values  for  this  algorithm.  At  present,  this 
algorithm  is  applied  to  the  area  function  only. 

Algorithm 

Let  OPT  be  the  array  containing  the  opening  phase  data,  and 
CLOSTIME  be  the  array  containing  closing  phase  data. 
1.   Let  the  number  of  glottal  cycles  be  NCYS  in  the  available  data 

record. 
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2.   Compute  the  speed  quotient  and  speed  index  for  every  cycle: 
for  I  =  1  to  NCYS 


snm  -   Wit,1? 

^[i)   '  CLOSTIME(I) 


«">-fftim 


3.   END. 


A. 3  Jitter,  Pitch  Perturbation,  Pitch  Perturbation  Factor,  and  Relative 
Average  Pitch  Perturbation 


This  algorithm  is  applied  to  both  the  area  and  EGG  data  records. 
Algorithm 

1.  Locate  instants  of  opening  and  closing  and  find  NCYS,  the  number  of 
glottal  cycles  in  the  record. 

2.  Compute  the  period  of  each  vibratory  cycle  and  store  in  array 
APERD. 

3.  Compute  the  average  period. 

4.  Jitter  measurements: 

a.  Compute  the  absolute  difference  between  the  period  of  each 
cycle  and  the  average  period.  We  define  this  as  the  jitter. 

b.  From  a  compute  the  variance  and  standard  deviation  of  the 
jitter. 

c.  Compute  the  overall  percent  jitter,  where 


=  sum  of  jitter  of  every  cycle  in  the  record 

number  of  cycles  measured  * 

=  average  jitter  x  100% 
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5.  Pitch  Perturbation  measurements: 

for  I  =  1  to  NCYS 

APERT(I+1)  =  |  APERT(I)  -  APERD( 1+1)  | 

.END 
i.e.,  The  pitch  perturbation  is  the  difference  between  consecutive 
periods.  The  difference  is  stored  in  array  APERT. 

6.  Pitch  Perturbation  factor  measurements: 

for  I  =  1  to  NPCYS-1 

if  I  APERT(I)  I  _>  0.5msec 
then  IAPERT  =  IPART+1 

.END 

IAPERT 
pitch  perturbation  factor  =  NCys-1 

7.  Compute  the  Relative  Average  Perturbation  (RAP): 

sum  =  0.0 

for  I  =  2  to  NCYS 

sum  =  sum  +  |  APERD( I-1)+APERD( I)+APERD( 1+1)  .  ApERD(I)  | 

.END 

RAP  =  sum 
KHK  '  NYCS-1 

8.  DONE. 
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